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A method for identifying a microorganism having a reduced adaptation to a particular environment comprising the steps of: (1) 
providing a plurality of microorganisms each of which is independenUy mutated by the inscrtional inactivation of a gene with a nucleic acid 
comprising a unique marker sequence so that each mutant contains a different maricer sequence, or clones of the said microorganism; (2) 
providing individually a stored sample of each mutant produced by step (I) and providing individually stored nucleic acid comprising the 
unique marker sequence from each individual muianu (3) introducing a plurality of mutants produced by step (1) into the said particular 
environment and allowing those microorganisms which art able to do so to grow in the said environment; (4) retrieving microorganisms 
from the said environment or a selected part thereof and isolating die nucleic acid from the retrieved microorganisms; (5) comparing any 
marker sequences in the nucleic acid isolated in step (4) to the unique marker sequence of each individual mutant stored as in step (2); and 
(6) selecting an individual mutant which does not contain any of the marker sequences as isolated in step (4). 
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IDENTIFICATION OF GENES 

The present invention relates to methods for the identification of genes 
involved in the adaptation of a microorganism to its environment, 
5 particularly the identification of genes responsible for the vinilence of a 
pathogenic microorganism. 

Background to the invention 

10 Antibiotic resistance in bacterial and other pathogens is becoming 
increasingly important. It is therefore important to find new therapeutic 
approaches to attack pathogenic microorganisms. 

Pathogenic microorganisms have to evade the host's defence mechanisms 
15 and be able to grow in a poor nutritional environment to establish an 
infection. To do so a number of "virulence" genes of the microorganism 
are required. 

Virulence genes have been detected using classical genetics and a variety 
20 of approaches have been used to exploit transposon mutagenesis for the 
identification of bacterial virulence genes. For example, mutants have 
been screened for defined physiological defects, such as the loss of iron 
regulated proteins (Holland et al, 1992), or in assays to study the 
penetration of epithelial cells (Finlay et aly 1988) and survival within 
25 macrophages (Fields et al, 1989; Miller et al^ 1989a; Groisman et al, 
1989). Transposon mutants have also been tested for altered virulence in 
live animal models of infection (Miller et al. 1989b). This approach has 
the advantage that genes can be identified which are important during 
different stages of infection, but is severely limited by the need to test a 
30 wide range of mutants individually for alterations to virulence. Miller et 
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al (1989b) used groups of 8 to 10 mice and infected orally 95 separate 
groups with a different mutant thereby using between 760 and 950 mice. 
Because of the extremely large numbers of animals required, 
comprehensive screening of a bacterial genome for virulence genes has not 
5 been feasible. 

Recently a genetic system {in vivo expression technology (IVET]) was 
described which positively selects for Salmonella genes that are 
specifically induced during infection (Mahan et al, 1993). The technique 
10 will identify genes that are expressed at a particular suge in the infection 
process. However, it will not identify virulence genes that are regulated 
posttranscriptionally, and more importantly, will not provide information 
on whether the gene(s) which have been identified are actually required 
for, or contribute to, the infection process. 

15 

Lee & Falkow (1994) Methods EnzymoL 236, 531-545 describe a method 
of identifying factors influencing the invasion of Salmonella into 
mammalian cells in vitro by isolating hyperinvasive mutants. 

20 Walsh and Cepko (1992) Science 255, 434-440 describe a method of 
tracking the spatial location of cerebral cortical progenitor cells during the 
development of the cerebral cortex in the rat. The Walsh and Cepko 
method uses a tag that contains a unique nucleic acid sequence and the 
lacZ gene but there is no indication that useful mutants or genes could be 

25 detected by their method. 

WO 94/26933 and Smith et al (1995) Proc. Natl. Acad. Sci. USA 92, 
6479-6483 describe methods aimed at the identification of the functional 
regions of a known gene, or at least of a DNA molecule for which some 
30 sequence information is available. 
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Groisman et al (1993) Proc. Nail Acad. ScL USA 90, 1033-1037 
describes the molecular, functional and evolutionary analysis of sequences 
specific to Salmonella. 

5 Some virulence genes are already known for pathogenic microorganisms 
such as Escherichia coli. Salmonella typhimurium. Salmonella typhi. 
Vibrio cholerae, Clostridium botulinwn. Yersinia pestis. Shigella flexneri 
and Listeria monocytogenes but in all cases only a relatively small number 
of the total have been identified. 

10 

The disease which Salmonella typhimurium causes in mice provides a good 
experimental model of typhoid fever (Carter & Collins, 1974). 
Approximately forty two genes affecting Salmonella virulence have been 
identified to date (Groisman & Ochman, 1994). These represent 
15 approximately one third of the total number of predicted virulence genes 
(Groisman and Saier, 1990). 

The object of the present invention is to identify genes involved in the 
adaptation of a microorganism to its environment, particularly to identify 
20 further virulence genes in pathogenic microorganisms, with increased 
efficiency. A fiirther object is to reduce the number of experimental 
animals used in identifying virulence genes. Still further objects of the 
invention provide vaccines, and methods for screening for drugs which 
reduce virulence. 

25 

Summary of the invention 

A first aspect of the invention provides a method for identifying a 
microorganism having a reduced adaptation to a particular environment 
30 comprising the steps of: 
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(1) providing a plurality of microorganisms each of which is 
independently mutated by the insertional inactivation of a gene with a 
nucleic acid comprising a unique marker sequence so that each mutant 
contains a different marker sequence, or clones of the said microorganism; 
5 (2) providing individually a stored sample of each mutant 

produced by step (1) and providing individually stored nucleic acid 
comprising the unique marker sequence from each individual mutant; 

(3) introducing a plurality of mutants produced by step (1) into 
the said particular environment and allowing those microorganisms which 

10 are able to do so to grow in the said environment; 

(4) retrieving microorganisms from the said environment or a 
selected part thereof and isolating the nucleic acid from the retrieved 
microorganisms; 

(5) comparing any marker sequences in the nucleic acid isolated 
15 in step (4) to the unique marker sequence of each individual mutant stored 

as in step (2); and 

(6) selecting an individual mutant which does not contain any of 
the marker sequences as isolated in step (4). 

20 Thus, the method uses negative selection to identify microorganisms with 
reduced capacity to proliferate in the environment. 

A microorganism can live in a number of different environments and it is 
known that particular genes and their products allow the microorganism 

25 to adapt to a particular environment. For example, in order for a 
pathogenic microorganism, such as a pathogenic bacterium or pathogenic 
fungus, to survive in its host the product of one or more virulence genes 
is required. Thus, in a preferred embodiment of the invention a gene of 
a microorganism which allows the microorganism to adapt to a particular 

30 environment is a virulence gene. 
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Conveniently, the particular environment is a differentiated multicellular 
organism such as a plant or animal. Many bacteria and fungi are known 
to infect plants and they are able to survive within the plant and cause 
disease because of the presence of and expression from virulence genes. 

5 Suitable microorganisms when the particular environment is a plant 
include the bacteria Agrobaaerium tumefaciens which forms tumours 
(galls) particularly in grape; Erwinia amylovara; Pseudomonas 
solanacearum which causes wilt in a wide range of plants; Rhizobium 
leguminosarum which causes disease in beans; Xaruhomonas campestris 

10 p.v. citri which causes canker in citrus fruits; and include the fungi 
Magnaponhe grisea which causes rice blast disease; Fusarium spp. which 
cause a variety of plant diseases; Erisyphe spp.; Colletotrichwn 
gloeosporiodes; Gaeumannomyces graminis which causes root and crown 
diseases in cereals and grasses; Glomus spp., Laccaria spp.; Leptosphaeria 

15 maculans; Phoma tracheiphila; Phytophthora spp., Pyrenophora teres; 
Verticillium alboatrum and V. dahliae; and Mycosphaerella musicola and 
M.fijiensis. As described in more detail below, when the microorganism 
is a fungus a haploid phase to its life cycle is required. 

20 Similarly, many microorganisms including bacteria, fiingi, protozoa and 
trypanosomes are known to infect animals, particulariy mammals including 
humans. Survival of the microorganism within the animal and the ability 
of the microorganism to cause disease is due in large part to the presence 
of and expression from virulence genes. Suitable bacteria include 

25 Bordetella spp. particulariy B. pertussis, Campylobacter spp. particulariy 
C. jejuni. Clostridium spp. particularly C. botulinum, Enterococcus spp. 
particularly £./a«cfl/«, Escherichia spp. particularly £. coli, Haemophilus 
spp. particularly H. ducreyi and H. influenzae, Helicobacter spp. 
particulariy H. pylori. Klebsiella spp. particulariy K. pneumoniae. 

30 Legionella spp. particularly L. pneumophila. Listeria spp. particulariy L 
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monocytogenes, Mycobacterium spp. particularly M, smegmaxis and Af. 
tuberculosis^ Neisseria spp. particularly N. gonorrhoeae and M 
meningitidis, Pseudomonas spp., particularly Ps. aeruginosa^ Salmonella 
spp.. Shigella spp., Staphylococcus spp. particularly 5. owreus, 

5 Streptococcus spp. particularly 5. pyogenes and pneumoniae. Vibrio spp. 
and Yersinia spp. particularly K /?«ri5. All of these bacteria cause disease 
in man and also there are animal models of the disease. Thus, when these 
bacteria are used in the method of the invention, the particular 
environment is an animal which they can infect and in which they cause 

10 disease. For example, when Salmonella typhimurium is used to infect a 
mouse the mouse develops a disease which serves as a model for typhoid 
fever in man. Staphylococcus aureus causes bacteraemia and renal abscess 
formation in mice (Albus et al (1991) Infea. Immun. 59, 1008-1014) and 
endocarditis in rabbits (Perlman & Freedman (1971) Yale J. Biol Med. 

15 44, 206-213). 

It is required that a fungus or higher eukaryotic parasite is haploid for the 
relevant parts of its life (such as growth in the environment). Preferably, 
a DNA-mediated integrative transformation system is available and, when 

20 the microorganism is a human pathogen, conveniently an animal model of 
the human disease is available. Suitable fungi pathogenic to humans 
include certain Aspergillus spp. (for example A. Jumigatus), Cryptococcus 
neoformans and Histoplasma capsulatum. Clearly the above-mentioned 
fungi have a haploid phase and a DNA-mediated integrative transformation 

25 systems are available for them. Toxoplasma may also be used, being a 
parasite with a haploid phase during infection. Bacteria have a haploid 
genome. 

Animal models of human disease are often available in which the animal 
30 is a mouse, rat, rabbit, dog or monkey. It is preferred if the animal is a 
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mouse. Virulence genes detected by the method of the invention using an 
animal model of a human disease are clearly very likely to be genes which 
determine the virulence of the microorganism in man. 

5 Particularly preferred microorganisms for use in the methods of the 
invention are Salmonella typhimurium. Staphylococcus aureus^ 
Streptococcus pneumoniae, Emerococcus faecalis, Pseudomonas 
aeruginosa and Aspergillus fimigatus. 

10 A preferred embodiment of the invention is now described. 

A nucleic acid comprising a unique marker sequence is made as follows. 
A complex pool of double stranded DNA sequence "tags" is generated 
using oligonucleotide synthesis and a polymerase chain reaction (PGR). 

15 Each DNA •'tag" has a unique sequence of between about 20 and 80 bp, 
preferably about 40 bp which is flanked by "arms" of about 15 to 30 bp, 
preferably about 20 bp, which are common to all "tags". The number of 
bp in the unique sequence is sufficient to allow large numbers (for 
example > 10'*^ of unique sequences to be generated by random 

20 oligonucleotide synthesis but not too large to allow the formation of 
secondary structures which may interfere with a PGR. Similariy, the 
length of the arms should be sufficient to allow efficient priming of 
oligonucleotides in a PGR. 

25 It is well known that the sequence at the 5' end of the oligonucleotide 
need not match the target sequence to be amplified. 

It is usual that the PGR primers do not contain any complementary 
structures with each other longer than 2 bases, especially at their 3' ends, 
30 as this feature may promote the formation of an artifactual product called 
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''primer dimer". When the 3' ends of the two primers hybridize, they 
form a ''primed template" complex, and primer extension results in a short 
duplex product called '^primer dimer", 

5 Internal secondary structure should be avoided in primers. For symmetric 
PCR, a 40-60% G+C content is often recommended for both primers, 
with no long stretches of any one base. The classical melting temperature 
calculations used in conjunction with DNA probe hybridization studies 
often predict that a given primer should anneal at a specific temperature 

10 or that the IT'C extension temperature will dissociate the primer/template 
hybrid prematurely. In practice, the hybrids are more effective in the 
PCR process than generally predicted by simple T„ calculations. 
» 

Optimum annealing temperatures may be determined empirically and may 
15 be higher than predicted. Taq DNA polymerase does have activity in the 
37-55 "^C region, so primer extension will occur during the annealing step 
and the hybrid will be stabilized. The concentrations of the primers are 
equal in conventional (symmetric) PCR and, typically, within 0.1- to 1- 
/iM range. 

20 

The ^'tags" are ligated into a transposon or transposon-like element to 
form the nucleic acid comprising a unique marker sequence. 
Conveniently, the transposon is carried on a suicide vector which is 
maintained as a plasmid in a "helper" organism, but which is lost after 

25 transfer to the microorganism of the method of the invention. For 
example, the "helper" organism may be a strain of Escherichia coli, the 
microorganism of the method may be Salmonella and the transfer is a 
conjugal transfer. Although the transposon can be lost after transfer, in 
a proportion of cells it undergoes a transposition event through which it 

30 integrates at random, along with its unique tag, into the genome of the 
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microorganism used in the method. It is most preferred if the transposon 
or transposon-like element can be selected. For example, in the case of 
Salmonella^ a kanamycin resistance gene may be present in the transposon 
and exconjugants are selected on medium containing kanamycin. It is also 
5 possible to complement an auxotrophic marker in the recipient cell with 
a functional gene in the nucleic acid comprising the unique marker. This 
method is particularly convenient when fungi are used in the method. 

Preferably the complementing functional gene is not derived from the 
10 same species as the recipient microorganism, otherwise non-random 
integration events may occur. 

It is also particularly convenient if the transposon or transposon-like 
element is carried on a vector which is maintained episomally (ie not as 

15 part of the chromosome) in the microorganism used in the method of the 
first aspect of the invention when in a first given condition whereas, upon 
changing that condition to a second given condition, the episome cannot 
be maintained permitting the selection of a cell in which the transposon or 
transposon-like element has undergone a transposition event through which 

20 it integrates at random, along with its unique tag, into the genome 

of the microorganism used in the method. This particularly convenient 
embodiment is advantageous because, once a microorganism carrying the 
episomal vector is made, then each time the transposition event is selected 
for or induced by changing the condition of the microorganism (or a clone 

25 ihereoO from a first given condition to a second given condition* the 
transposon can integrate at a different site in the genome of the 
microorganism. Thus, once a master collection of microorganisms are 
made, each member of which contains a unique tag sequence in the 
transposon or transposon-like element carried on the episomal vector 

30 (when in the first given condition), it can be used repeatedly to generate 
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pools of random insertional mutants, each of which contains a different tag 
sequence (ie unique within the pool). This embodiment is particularly 
useful because (a) it reduces the number and complexity of manipulations 
required to generate the plurality ("pool") of independently mutated 

5 microorganisms in step (1) of the method; and (b) the number of different 
tags need only be the same as the number of microorganisms in the 
plurality of microorganisms in step (1) of the method. Point (a) makes the 
method easier to use in organisms for which transposon mutagenesis is 
more difficult to perform (for example. Staphylococcus aureus) and point 

10 (b) means that tag sequences with particularly good hybridisation 
characteristics can be selected therefore making quality control easier. As 
is described in more detail below, the "pool" size is conveniently about 
100 or 200 independently-mutated microorganisms and, therefore the 
master collection of microorganisms is conveniently stored in one or two 

15 96- well microtitre plates. 

In a particularly preferred embodiment the first given condition is a first 
particular temperature or temperature range such as 25°C to SZ'^C, most 
preferably about 30°C and the second given condition is a second 

20 particular temperature or temperature range such as 35**C to 45 ^'C, most 
preferably 42**C. In further preferred embodiments the first given 
condition is the presence of an antibiotic, such as streptomycin, and the 
second given condition is the absence of the said antibiotic; or the first 
given condition is the absence of an antibiotic and the second given 

25 condition is the presence of the said antibiotic. 

Transposons suitable for integration into the genome of Gram negative 
bacteria include Tn5, TnlO and derivatives thereof. Transposons suitable 
for integration into the genome of Gram positive bacteria include Tn916 
30 and derivatives or analogues thereof. Transposons particularly suited for 
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use with Staphylococcus aureus include Tn917 (Cheung etal (1992) Proc. 
Natl Acad. ScL USA 89, 6462-6466) and Tn918 (Albus et al (1991) 
Infect. Immun. 59, 1008-1014). 

5 It is particularly preferred if the transposons have the properties of the 
Tn917 derivatives described by Camilli et al (1990) J. BacterioL 172, 
3738-3744, and are carried by a temperature-sensitive vector such as 
pE194Ts (ViUafane et al (1987) J. BacterioL 169, 4822-4829). 

10 It will be appreciated that although transposons are convenient for 
insertionally inactivating a gene, any other known method, or method 
developed in the future may be used. A further convenient method of 
insertionally inactivating a gene, particularly in certain bacteria such as 
Streptococcus y is using insertion-duplication mutagenesis such as that 

15 described in Morrison et al (1984) J. Bacterial 159, 870 with respect to 
S.pneumoniae. The general method may also be applied to other 
microorganisms, especially bacteria. 

For fungi, insertional mutations are created by transformation using DNA 
20 fragments or plasmids carrying the "tags" and, preferably, selectable 
markers encoding, for example, resistance to hygromycin B or phleomycin 
(see Smith et al (1994) Infect. ImmimoL 62, 5247-5254). Random, single 
integration of DNA fragments encoding hygromycin B resistance into the 
genome of filamentous fungi, using restriction enzyme mediated 
25 integration (REMI; Schiestl & Petes (1991); Lu et al (1994) Proc. Nail 
Acad. ScL USA 91, 12649-12653) are known, 

A simple insertional mutagenesis technique for a fungus is described in 
Schiestl & Petes (1994) incorporated herein by reference, and include, for 
30 example, the use of Ty elements and ribosomal DNA in yeast. 
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Random integration of the transposon or other DNA sequence allows 
isolation of a plurality of independently mutated microorganisms wherein 
a different gene is insertionally inactivated in each mutant and each mutant 
contains a different marker sequence. 

5 

A library of such insertion mutants is arrayed in welled microtitre dishes 
so that each well contains a different mutant microorganism. DNA 
comprising the unique marker sequence from each individual mutant 
microorganism (conveniently, the total DNA from the clone is used) is 

10 stored. Conveniently, this is done by removing a sample of the 
microorganism from the microtitre dish, spotting it onto a nucleic acid 
hybridisation membrane (such as nitrocellulose or nylon membranes), 
lysing the microorganism in alkali and fixing the nucleic acid to the 
membrane. Thus, a replica of the contents of the welled microtitre dishes 

15 is made. 

Pools of the microorganisms from the welled microtitre dish are made and 
DNA is extracted. This DNA is used as a target for a PCR using primers 
that anneal to the common "arms'* flanking the "tags'* and the amplified 

20 DNA is labelled, for example with ^-P. The product of the PCR is used 
to probe the DNA stored from each individual mutant to provide a 
reference hybridisation pattern for the replicas of the welled microtitre 
dishes. This is a check that each of the individual microorganisms does, 
in fact, contain a marker sequence and that the marker sequence can be 

25 amplified and labelled efficiently. 

Pools of transposon mutants are made to introduce into the particular 
environment. Conveniently, 96-well microtitre dishes are used and the 
pool contains 96 transposon mutants. However, the lower limit for the 
30 pool is two mutants: there is no theoretical upper limit to the size of the 
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pool but, as discussed below, the upper limit may be determined in 
relation to the environment into which the mutants are introduced. 

Once the microorganisms are introduced into the said particular 
5 environment those microorganisms which are able to do so are allowed to 
grow in the environment. The length of time the microorganisms arc left 
in the environment is determined by the nature of the microorganism and 
the environment. After a suitable length of time, the microorganisms are 
recovered from the environment, DNA is extracted and the DNA is used 
10 as a template for a PGR using primers that anneal to the "arms" flanking 
the "tags'*. The PGR product is labelled, for example with '^P, and is 
used to probe the DNA stored from each individual mutant replicated from 
the welled microti tre dish. Stored DNA are identified which hybridise 
weakly or not at all with the probe generated from the DNA isolated from 
15 the microorganisms recovered from environment. These non-hybridising 
DNAs correspond to mutants whose adaptation to the particular 
environment has been attenuated by insertion of the transposon or other 
DNA sequence. 

20 In a particularly preferred embodiment the "arms'* have no, or very little, 
label compared to the "tags". For example, the PGR primers are suitably 
designed to contain no, or a single, G residue, the ^P-labelled nucleotide 
is dGTP and, in this case, no or one radiolabelled C residue is 
incorporated in each "arm" but a greater number of radiolabelled G 

25 residues are incorporated in the "tag". It is preferred if the "tag" has at 
least ten- fold more label incorporated than the "arms"; preferably twenty- 
fold or more; more preferably fifty-fold or more. Gonveniently the 
"arms" can be removed from the **tag" using a suitable restriction 
enzyme, a site for which may be incorporated in the primer design. 

30 
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As discussed above, a particularly preferred embodiment of the invention 
is when the microorganism is a pathogenic microorganism and the 
particular environment is an animal. In this embodiment, the size of the 
pool of mutants introduced into the animal is determined by (a) the 
5 number of cells of each mutant that are likely to survive in the animal 
(assuming a virulence gene has not been inactivated) and (b) the total 
inoculum of the microorganism. If the number in (a) is too low then false 
positive results may arise and if the number in (b) is too high then the 
animal may die before enough mutants have had a chance to grow in the 
10 desired way. The number of cells in (a) can be determined for each 
microorganism used but it is preferably more than 50, more preferably 
more than 100. 

The number of different mutants that can be introduced into a single 
15 animal is preferably between 50 and 500, conveniently about 100. It is 
preferred if the total inoculum does not exceed 10* cells (and it is 
preferably 10^ cells) although the size of the inoculum may be varied 
above or below this amount depending on the microorganism and the 
animal. 

20 

In a particularly convenient method an inoculum of lO' is used containing 
1000 cells each of 100 different mutants for a single animal. It will be 
appreciated that in this method one animal can be used to screen 100 
mutants compared to prior art methods which require at least 100 animals 
25 to screen 100 mutants. 

However, it is convenient to inoculate three animals with the same pool 
of mutants so that at least two can be investigated (one as a replica to 
check the reliability of the method), whilst the third is kept as a back-up. 
30 Nevertheless, the method still provides a greater than 30-fold saving in the 
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number of animals used. 

The time between the pool of mutants being introduced into the animal 
and the microorganisms being recovered may vary with the microorganism 
5 and animal used. For example, when the animal is a mouse and the 
microorganism is Salmonella typhimurium, the time between inoculation 
and recovery is about three days. 

In one embodiment of the invention microorganisms are retrieved from the 
10 environment in step (5) at a site remote from the site of introduction in 
step (4), so that the virulence genes being investigated include those 
involved in the spread of the microorganism between the two sites. 

For example, in a plant the microorganism may be introduced in a lesion 
15 in the stem or at one site on a leaf and the microorganism retrieved from 
another site on the leaf where a disease state is indicated. 

In the case of an animal, the microorganism may be introduced orally, 
intraperitoneally, intravenously or intranasally and is retrieved at a later 
20 time from an internal organ such as the spleen. It may be useful to 
compare the virulence genes identified by oral administration and those 
identified by intraperitoneal administration as some genes may be required 
to establish infection by one route but not by the other. It is preferred if 
Salmonella is introduced intraperitoneally. 

25 

Other preferred environments which may be used to identify virulence 
genes are animal cells in culture (particularly macrophages and epithelial 
cells) and plant cells in culture. Although using cells in culture will be 
useful in its own right, it will also complement the use of the whole 
30 animal or plant, as the case may be. as the environment. 
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It is also preferred if the environment is a part of the animal body. 
Within a given host-parasite interaction, a number of different 
environments are possible, including different organs and tissues, and 
parts thereof such as the Peyer*s patch. 

5 

The number of individual microorganisms (ie cells) recovered from the 
environment should be at least twice, preferably at least ten times, more 
preferably 100-times the number of different mutants introduced into the 
environment. For example, when an animal is inoculated with 100 
10 different mutants around 10,000 individual microorganisms should be 
retrieved and their marker DNA isolated. 



A further preferred embodiment comprises the steps: 

15 (1 A) removing auxotrophs from the plurality of mutants produced in step 
(1); or 

(6A) determining whether the mutant selected in step (6) is an auxotroph; 
or 

20 

both (lA) and (6A). 



It is desirable to distinguish an auxotroph (that is a mutant microorganism 
which requires growth factors not needed by the wild type or by 
25 prototrophs) and a mutant microorganism wherein a gene allowing the 
microorganism to adapt to a particular environment is inactivated. 
Conveniently, this is done between steps (1) and (2) or after step (6). 



30 



Preferably auxotrophs are not removed when virulence genes are being 
identified. 
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A second aspect of the invention provides a method of identifying a gene 
which allows a microorganism to adapt to a particular environment, the 
method comprising the method of the first aspect of the invention, 
followed by the additional step: 

5 

(7) isolating the insertionally-inactivated gene or part thereof from the 
individual mutant selected in step (6). 

Methods for isolating a gene containing a unique marker are well known 
10 in the art of molecular biology. 

A further preferred embodiment comprises the further additional step: 

(8) isolating from a wild-type microorganism the corresponding wild- 
15 type gene using the insertionally-inactivated gene isolated in step (7) or 

part thereof as a probe. 

Methods for gene probing are well known in the art of molecular biology. 

20 Molecular biological methods suitable for use in the practice of the present 
invention are disclosed in Sambrook et al (1989) incorporated herein by 
reference. 

When the microorganism is a microorganism pathogenic to an animal and 
25 the gene is a virulence gene and a transposon has been used to 
insertionaliy inactivate the gene, it is convenient for the virulence genes 
to be cloned by digesting genomic DNA from the individual mutant 
selected in step (6) with a restriction enzyme which cuts outside the 
transposon, ligating size-fractionated DNA containing the transposon into 
30 a plasmid, and selecting plasmid recombinants on the basis of antibiotic 
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resistance conferred by the transposon and not by the plasmid. The 
microorganism genomic DNA adjacent to the transposon is sequenced 
using two primers which anneal to the terminal regions of the transposon, 
and two primers which anneal close to the polylinker sequences of the 

5 plasmid. The sequences may be subjected to DNA database searches to 
determine if the transposon has interrupted a known virulence gene. 
Thus, conveniently, sequence obtained by this method is compared against 
the sequences present in the publicly available databases such as EMBL 
and GenBank. Finally, if the interrupted sequence appears to be in a new 

10 virulence gene, the mutation is transferred to a new genetic background 
(for example by phage P22-mediated transduction in the case of 
Salmonella) and the LDy, of the mutant strain is determined to confirm 
that the avirulent phenotype is due to the transposition event and not a 
secondary mutation. 

15 

The number of individual mutants screened in order to detect all of the 
virulence genes in a microorganism depends on the number of genes in the 
genome of the microorganism. For example, it is likely that 3000-5000 
mutants of Salmonella typhimurium need to be screened in order to detect 

20 the majority of virulence genes whereas for Aspergillus spp., which has 
a larger genome than Salmonella, around 20 000 mutants are screened. 
Approximately 4% of non-essential S. typhimurium genes are thought to 
be required for virulence (Grossman & Saier, 1990) and, if so, the 5. 
typhimurium genome contains approximately 150 virulence genes. 

25 However, the methods of the invention provide a faster, more convenient 
and much more practicable route to identifying virulence genes. 

A third aspect of the invention provides a microorganism obtained using 
the method of the first aspect of the invention. 
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Such microorganisms arc useftjl because they have the property of not 
being adapted to survive in a particular environment. 

In a preferred embodiment, a pathogenic microorganism is not adapted to 
5 survive in a host organism (environment) and, in the case of 
microorganisms that arc pathogenic to animals, particularly mammals, 
more particularly humans, the mutant obtained by the method of the 
invention may be used in a vaccine. The mutant is avinilent, and 
therefore expected to be suitable for administration to a patient, but it is 
10 expected to be antigenic and give rise to a protective immune response. 

In a further preferred embodiment the pathogenic microorganism not 
adapted to survive in a host organism, obtained by the methods of the 
invention, is modified, preferably by the introduction of a suitable DNA 
15 sequence, to express an antigenic epitope from another pathogen. This 
modified microorganism can act as a vaccine for that other pathogen. 

A fourth aspect of the invention provides a microorganism comprising a 
mutation in a gene identified using the method of the second aspect of the 
20 invention. 

Thus, although the microorganism of the third aspect of the invention is 
useful, it is preferred if a mutation is specifically introduced into the 
identified gene. In a preferred embodiment, particularly when the 

25 microorganism is to be used in a vaccine, the mutation in the gene is a 
deletion or a frameshift mutation or any other mutation which is 
substantially incapable of reverting. Such gene-specific mutations can be 
made using standard procedures such as introducing into the 
microorganism a copy of the mutant gene on an autonomous replicon 

30 (such as a plasmid or viral genome) and relying on homologous 
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recombination to introduce the mutation into the copy of the gene in the 
genome of the microorganism. 

Fifth and sixth aspects of the invention provide a suitable microorganism 
5 for use in a vaccine and a vaccine comprising a suitable microorganism 
and a pharmaccutically-acceptable carrier. 

The suitable microorganism is the aforementioned avirulent mutant. 

10 Active immunisation of the patient is preferred. In this approach, one or 
more mutant microorganisms are prepared in an immunogenic formulation 
containing suitable adjuvants and carriers and administered to the patient 
in known ways. Suitable adjuvants include Freund's complete or 
incomplete adjuvant, muramyl dipeptide, the "Iscoms'' of EP 109 942, EP 

15 180 564 and EP 231 039, aluminium hydroxide, saponin, DEAE-dextran, 
neutral oils (such as miglyol), vegetable oils (such as arachis oil), 
liposomes, Pluronic polyols or the Ribi adjuvant system (see, for example 
GB-A-2 189 141). "Pluronic" is a Registered Trade Mark. The patient 
to be immunised is a patient requiring to be protected from the disease 

20 caused by the virulent form of the microorganism. 

The aforementioned avirulent microorganisms of the invention or a 
formulation thereof may be administered by any conventional method 
including oral and parenteral (eg subcutaneous or intramuscular) injection. 
25 The treatment may consist of a single dose or a plurality of doses over a 
period of time. 

Whilst it is possible for an avirulent microorganism of the invention to be 
administered alone, it is preferable to present it as a pharmaceutical 
30 formulation, together with one or more acceptable carriers. The carrier(s) 
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must be '^acceptable" in the sense of being compatible with the avirulent 
microorganism of the invention and not deleterious to the recipients 
thereof. Typically, the carriers will be water or saline which will be 
sterile and pyrogen free. 

5 

It will be appreciated that the vaccine of the invention, depending on its 
microorganism component, may be useful in the fields of human medicine 
and veterinary medicine. 

10 Diseases caused by microorganisms are known in many animals, such as 
domestic animals. The vaccines of the invention, when containing an 
appropriate avirulent microorganism, particularly avirulent bacterium, are 
useful in man but also in, for example, cows, sheep, pigs, horses, dogs 
and cats, and in poultry such as chickens, turkeys, ducks and geese. 

15 

Seventh and eighth aspects of the invention provide a gene obtained by the 
method of the second aspect of the invention, and a polypeptide encoded 
thereby. 

20 By '"gene" we include not only the regions of DNA that code for a 
polypeptide but also regulatory regions of DNA such as regions of DNA 
that regulate transcription, translation and, for some microorganisms, 
splicing of RNA. Thus, the gene includes promoters, transcription 
terminators, ribosome-binding sequences and for some organisms introns 

25 and splice recognition sites. 

Typically, sequence information of the inactivated gene obtained in step 
7 is derived. Conveniently, sequences close to the ends of the transposon 
are used as the hybridisation site of a sequencing primer. The derived 
30 sequence or a DNA restriction fragment adjacent to the inactivated gene 
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itself is used to make a hybridisation probe with which to identify, and 
isolate from a wild-type organism, the corresponding wild type gene. 

It is preferred if the hybridisation probing is done under stringent 
5 conditions to ensure that the gene, and not a relative, is obtained. By 
"stringent** we mean that the gene hybridises to the probe when the gene 
is immobilised on a membrane and the probe (which, in this case is > 200 
nucleotides in length) is in solution and the immobilised gene/hybridised 
probe is washed in 0.1 x SSC at 65 X for 10 min. SSC is 0.15 M 
10 NaCI/0.015 M Na citrate. 

Preferred probe sequences for cloning Salmonella virulence genes are 
shown in Figures 5 and 6 and described in Example 2. 

15 In a particulariy preferred embodiment the Salmonella virulence genes 
comprise the sequence shown in Figures 5 and 6 and described in 
Example 2. 

In further preference the genes are those contained within, or at least part 
20 of which is contained within, the sequences shown in Figures 1 1 and 12 
and which have been identified by the method of the second aspect of the 
invention. The sequences shown in Figures 1 1 and 12 are part of a gene 
cluster from Salmonella typhimurium which I have called virulence gene 
cluster 2 (VGC2). The position of transposon insertions are indicated 
25 within the sequence, and these transposon insertions inactivate a virulence 
determinant of the organism. As is discussed more fully below, and in 
particular in Example 4, when the method of the second aspect of the 
invention is used to identify virulence genes in Salmonella typhimurium. 
many of the nucleic acid insertions (and therefore genes identified) are 
30 clustered in a relatively small part of the genome. This region, VGC2, 
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contains other virulence genes which, as described below, form part of the 
invention. 

The gene isolated by the method of the invention can be expressed in a 
5 suitable host cell. Thus, the gene (DNA) may be used in accordance with 
known techniques, appropriately modified in view of the teachings 
contained herein, to construct an expression vector, which is then used to 
transform an appropriate host cell for the expression and production of the 
polypeptide of the invention. Such techniques include those disclosed in 
10 US Patent Nos. 4,440,859 issued 3 April 1984 to Rutter et al, 4,530,901 
issued 23 July 1985 to Weissman, 4,582,800 issued 15 April 1986 to 
Crowl, 4,677,063 issued 30 June 1987 to Mark et al, 4,678,751 issued 7 
July 1987 to Goeddel, 4,704,362 issued 3 November 1987 to Itakura et ol, 
4,710,463 issued 1 December 1987 to Murray, 4,757,006 issued 12 July 
15 1988 to Toole, Jr. et al, 4,766,075 issued 23 August 1988 to Goeddel et 
al and 4,810,648 issued 7 March 1989 to Stalker, all of which are 
incorporated herein by reference. 

The DNA encoding the polypeptide constituting the compound of the 
20 invention may be joined to a wide variety of other DNA sequences for 
introduction into an appropriate host. The companion DNA will depend 
upon the nature of the host, the manner of the introduction of the DNA 
into the host, and whether episomal maintenance or integration is desired. 

25 Generally, the DNA is inserted into an expression vector, such as a 
plasmid, in proper orientation and correct reading frame for expression. 
If necessary, the DNA may be linked to the appropriate transcriptional and 
translational regulatory control nucleotide sequences recognised by the 
desired host, although such controls are generally available in the 

30 expression vector. The vector is then introduced into the host through 
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standard techniques. Generally, not all of the hosts will be transformed 
by the vector. Therefore, it will be necessary to select for transformed 
host cells. One selection technique involves incorporating into the 
expression vector a DNA sequence, with any necessary control elements, 
5 that codes for a selectable trait in the transformed cell, such as antibiotic 
resistance. Alternatively, the gene for such selectable trait can be on 
another vector, which is used to co-transform the desired host cell. 



Host cells that have been transformed by the recombinant DNA of the 
10 invention are then cultured for a sufficient time and under appropriate 
conditions known to those skilled in the art in view of the teachings 
disclosed herein to permit the expression of the polypeptide, which can 
then be recovered. 

15 Many expression systems are known, including bacteria (for example E, 
coli and Bacillus subtilis), yeasts (for example Saccharomyces cerevisiae), 
filamentous fungi (for example Aspergillus), plant cells, animal cells and 
insect cells. 



20 The vectors include a prokaryotic replicon, such as the ColEl ori, for 
propagation in a prokaryote, even if the vector is to be used for expression 
in other, non-prokaryotic, cell types. The vectors can also include an 
appropriate promoter such as a prokaryoticj promoter capable of directing 
the expression (transcription and translation) of the genes in a bacterial 

25 host cell, such as E. coli, transformed therewith. 

A promoter is an expression control element formed by a DNA sequence 
that permits binding of RNA polymerase and transcription to occur. 
Promoter sequences compatible with exemplary bacterial hosts are 
30 typically provided in plasmid vectors containing convenient restriction sites 



wo 96/17951 



PCT/GB95/02875 



25 

for insertion of a DNA segment of the present invention. 

Typical prokaryotic vector plasmids are pUC18, pUC19, pBR322 and 
pBR329 available from Biorad Laboratories, (Richmond, CA, USA) and 
5 prrc99A and pKK223-3 available from Pharmacia, Piscataway, NJ, USA. 

A typical mammalian cell vector plasmid is pSVL available from 
Pharmacia, Piscataway, NJ, USA. This vector uses the SV40 late 
promoter to drive expression of cloned genes, the highest level of 
10 expression being found in T antigen-producing cells, such as COS-1 cells. 

An example of an inducible mammalian expression vector is pMSG, also 
available from Pharmacia. This vector uses the glucocorticoid-inducible 
promoter of the mouse mammary tumour virus long terminal repeat to 
15 drive expression of the cloned gene. 

Useful yeast plasmid vectors are pRS403-406 and pRS413-416 and are 
generally available from Stratagene Cloning Systems, La Jolla, CA 92037, 
USA. Plasmids pRS403, pRS404, pRS405 and pRS406 are Yeast 
20 Integrating plasmids (Yips) and incorporate the yeast selectable markers 
HIS3, TRPJ, LEU2 and URA3. Plasmids pRS413-416 are Yeast 
Centromere plasmids (YCps) 

A variety of methods have been developed to operably link DNA to 
25 vectors via complementary cohesive termini. For instance, 
complementary homopolymer tracts can be added to the DNA segment to 
be inserted to the vector DNA. The vector and DNA segment are then 
joined by hydrogen bonding between the complementary homopolymeric 
tails to form recombinant DNA molecules. 

30 
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Synthetic linkers containing one or more restriction sites provide an 
alternative method of joining the DNA segment to vectors. The DNA 
segment, generated by endonuclease restriction digestion as described 
earlier, is treated with bacteriophage T4 DNA polymerase or E. coli DNA 
5 polymerase I, enzymes that remove protruding, 3 '-single-stranded termini 
with their 3*-5'-exonucleolytic activities, and fill in recessed 3'-ends with 
their polymerizing activities. 

The combination of these activities therefore generates blunt-ended DNA 
10 segments. The blunt-ended segments are then incubated with a large 
molar excess of linker molecules in the presence of an enzyme that is able 
to catalyze the ligation of blunt-ended DNA molecules, such as 
bacteriophage T4 DNA ligase. Thus, the products of the reaction are 
DNA segments carrying polymeric linker sequences at their ends. These 
15 DNA segments are then cleaved with the appropriate restriction enzyme 
and ligated to an expression vector that has been cleaved with an enzyme 
that produces termini compatible with those of the DNA segment. 

Synthetic linkers containing a variety of restriction endonuclease sites are 
20 commercially available from a number of sources including International 
Biotechnologies Inc, New Haven, CN, USA. 

A desirable way to modify the DNA encoding the polypeptide of the 
invention is to use the polymerase chain reaction as disclosed by Saiki et 
25 al (1988) Science 239, 487-491 . 

In this method the DNA to be enzymatically amplified is flanked by two 
specific oligonucleotide primers which themselves become incorporated 
into the amplified DNA. The said specific primers may contain restriction 
30 endonuclease recognition sites which can be used for cloning into 
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expression vectors using methods known in the art. 

Variants of the genes also form part of the invention. It is preferred if the 
variant has at least 70% sequence identity, more preferably at least 85% 
5 sequence identity, most preferably at least 95 % sequence identity with the 
genes isolated by the method of the invention. Of course, replacements, 
deletions and insertions may be tolerated. The degree of similarity 
between one nucleic acid sequence and another can be determined using 
the GAP program of the University of Wisconsin Computer Group. 

10 

Similarly, variants of proteins encoded by the genes are included. 

By "variants" we include insertions, deletions and substitutions, either 
conservative or non-conservative, where such changes do not substantially 
15 alter the normal ftinction of the protein. 

By "conservative substitutions" is intended combinations such as Gly, Ala; 
Val, He, Leu; Asp, Glu; Asn, Gin; Ser, Thr; Lys, Arg; and Phe, Tyr. 

20 Such variants may be made using the well known methods of protein 
engineering and site-directed mutagenesis. 

A ninth aspect of the invention provides a method of identifying a 
compound which reduces the ability of a microorganism to adapt to a 
25 particular environment comprising the steps of selecting a compound 
which interferes with the function of (1) a gene obtained by the method of 
the second aspect of the invention or of (2) a polypeptide encoded by such 
a gene. 

30 Pairwise screens for compounds which affect the wild type ceil but not a 



wo 96/17951 PCT/GB95rt>2875 

28 

cell overproducing a gene isolated by the methods of the invention form 
part of this aspect of the invention. 

For example, in one embodiment one cell is a wild type cell and a second 
5 cell is the Salmonella which is made to overcxpress the gene isolated by 
the method of the invention. The viability and/or growth of each cell in 
the particular environment is determined in the presence of a compound 
to be tested to identify which compound reduces the viability or growth 
of the wild type cell but not the cell overexpressing the said gene, 

10 

It is preferred if the gene is a virulence gene. 

For example, in one embodiment the microorganism (such as 5. 
typhimurium) is made to over-express the virulence gene identified by the 

15 method of the first aspect of the invention. Each of (a) the "over- 
expressing" microorganism and (b) an equivalent microorganism (which 
does not over-express the virulence gene) are used to infect cells in 
culture. Whether a particular test compound will selectively inhibit the 
virulence gene function is determined by assessing the amount of the test 

20 compound which is required to prevent infection of the host cells by (a) 
the over-expressing microorganism and (b) the equivalent microorganism 
(at least for some virulence gene products it is envisaged that the test 
compound will inactivate them, and itself be inactivated, by binding to the 
virulence gene product). If more of the compound is required to prevent 

25 infection by the (a) than (b) then this suggests that the compound is 
selective. It is preferred if the microorganisms (such as Salmonella) are 
destroyed extracellularly by a mild antibiotic such as gentamicin (which 
does not penetrate host cells) and that the effect of the test compound in 
preventing infection of the cell by the microorganism is by lysing the said 

30 cell and determining how many microorganisms are present (for example 
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by plating on agar). 

Pairwise screens and other screens for compounds are generally disclosed 
in Kirsch & Di Domenico (1993) in •^The Discovery of Natural Products 
5 with a Therapeutic Potential" (Ed, V.P. Gallo), Chapter 6, pages 177-221, 
Butterworths, V.K. (incorporated herein by reference). 

Pairwise screens can be designed in a number of related formats in which 
the relative sensitivity to a compound is compared using two genetically 

10 related strains. If the strains differ at a single locus, then a compound 
specific for that target can be identified by comparing each strain's 
sensitivity to the inhibitor. For example, inhibitors specific to the target 
will be more active against a super-sensitive test strain when compared to 
an otherwise isogenic sister strain. In an agar diffusion format, this is 

15 determined by measuring the size of the zone of inhibition surrounding the 
disc or well carrying the compound. Because of diffusion, a continuous 
concentration gradient of compound is set up, and the strain's sensitivity 
to inhibitors is proportional to the distance from the disc or well to the 
edge of the zone. General antimicrobials, or antimicrobials with modes 

20 of action other than the desired one are generally observed as having 
similar activities against the two strains. 

Another type of molecular genetic screen, involving pairs of strains where 
a cloned gene product is overexpressed in one strain compared to a control 

25 strain. The rationale behind this type of assay is that the strain containing 
an elevated quantity of the target protein should be more resistant to 
inhibitors specific to the cloned gene product than an isogenic strain, 
containing normal amounts of the target protein. In an agar diffusion 
assay, the zone size surrounding a specific compound is expected to be 

30 smaller in the strain overexpressing the target protein compared to an 
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otherwise isogenic strain. 

Additionally or alternatively selection of a compound is achieved in the 
following steps: 

5 

1 . A mutant microorganism obtained using the method of the first 
aspect of the invention is used as a control (it has a given phenotype, for 
example, avirulence). 

10 2. A compound to be tested is mixed with the wild-type 
microorganism. 

3. The wild-type microorganism is introduced into the environment 
(with or without the test compound). 

15 

4. If the wild-type microorganism is unable to adapt to the 
environment (following treatment by, or in the presence of, the 
compound), the compound is one which reduces the ability of the 
microorganism to adapt to, or survive in, the particular environment. 

20 

When the environment is an animal body and the microorganism is a 
pathogenic microorganism, the compound identified by this method can be 
used in a medicament to prevent or ameliorate infection with the 
microorganism. 

25 

A tenth aspect of the invention therefore provides a compound identifiable 
by the method of the ninth aspect. 

It will be appreciated that uses of the compound of the tenth aspect are 
30 related to the method by which it can be identified, and in particular in 
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relation to the host of a pathogenic microorganism. For example, if the 
compound is identifiable by a method which uses a virulence gene, or 
polypeptide encoded thereby, from a bacterium which infects a mammal, 
the compound may be useful in treating infection of a mammal by that 
5 bacterium. 

Similarly, if the compound is identifiable by a method which uses a 
virulence gene, or polypeptide encoded thereby, from a fungus which 
infects a plant, the compound may be useful in treating infection of a plant 
10 by that fungus. 

An eleventh aspect of the invention provides a molecule which selectively 
interacts with, and substantially inhibits the function of, a gene of the 
seventh aspect of the invention or a nucleic acid product thereof. 

15 

By "nucleic acid product thereof" we include any RNA, especially 
mRNA, transcribed from the gene. 

Preferably a molecule which selectively interacts with, and substantially 
20 inhibits the function of, said gene or said nucleic acid product is an 
anti sense nucleic acid or nucleic acid derivative. 

More preferably, said molecule is an antisense oligonucleotide. 

25 Antisense oligonucleotides are single-stranded nucleic acid, which can 
specifically bind to a complementary nucleic acid sequence. By binding 
to the appropriate target sequence, an RNA-RNA, a DNA-DNA, or RNA- 
DNA duplex is formed. These nucleic acids are often termed ^'antisense" 
because they are complementary to the sense or coding strand of the gene. 

30 Recently, formation of a triple helix has proven possible where the 
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oligonucleotide is bound to a DNA duplex. It was found that 
oligonucleotides could recognise sequences in the major groove of the 
DNA double helix. A triple helix was formed thereby. This suggests that 
it is possible to synthesise sequence-specific molecules which specifically 
5 bind double-stranded DNA via recognition of major groove hydrogen 
binding sites. 

Clearly, the sequence of the antiscnse nucleic acid or oligonucleotide can 
readily be determined by reference to the nucleotide sequence of the gene 
10 in question. For example, antisense nucleic acid or oligonucleotides can 
be designed which are complementary to a part of the sequence shown in 
Figures 11 or 12, especially to sequences which form a part of a virulence 
gene. 

15 Oligonucleotides are subject to being degraded or inactivated by cellular 
endogenous nucleases. To counter this problem, it is possible to use 
modified oligonucleotides, eg having altered internucleotide linkages, in 
which the naturally occurring phosphodiester linkages have been replaced 
with another linkage. For example, Agrawal et al (1988) Proc. Natl 

20 Acad. ScL USA 85, 7079-7083 showed increased inhibition in tissue 
culture of HIV-1 using oligonucleotide phosphoramidates and 
phosphorothioates. Sarin et al (1988) Proc. Nail Acad. ScL USA 85, 
7448-7451 demonstrated increased inhibition of HIV-1 using 
oligonucleotide methylphosphonates. Agrawal et al (1989) Proc. Natl. 

25 Acad. ScL USA 86, 7790-7794 showed inhibition of HIV-1 replication in 
both early-infected and chronically infected cell cultures, using nucleotide 
sequence-specific oligonucleotide phosphorothioates. Leither er al (1990) 
Proc. NatL Acad, ScL USA 87. 3430-3434 report inhibition in tissue 
cuhure of influenza virus replication by oligonucleotide phosphorothioates. 
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Oligonucleotides having artificial linkages have been shown to be resistant 
to degradation in vivo. For example, Shaw et al (1991) in Nucleic Acids 
Res. 19, 747-750, report that otherwise unmodified oligonucleotides 
become more resistant to nucleases in vivo when they are blocked at the 
5 3' end by certain capping structures and that uncapped oligonucleotide 
phosphorothioates are not degraded in vivo. 

A detailed description of the H-phosphonate approach to synthesizing 
oligonucleoside phosphorothioates is provided in Agrawal and Tang (1990) 

10 Tetrahedron Letters 31, 7541-7544, the teachings of which are hereby 
incorporated herein by reference. Syntheses of oligonucleoside 
methylphosphonates, phosphorodithioates, phosphoramidates, phosphate 
esters, bridged phosphoramidates and bridge phosphorothioates are known 
in the art. See, for example, Agrawal and Goodchild (1987) Tetrahedron 

15 Uners 28, 3539; Nielsen et a/ (1988) Tetrahedron Utters 29, 291 1 ; Jager 
et al (1988) Biochemistry 27, 7237; Uznanski et al (1987) Tetrahedron 
Utters 28, 3401; Bannwarth (1988) Helv. Chim. Acta. 71, 1517; 
Crosstick and Vyle (1989) Tetrahedron Utters 30, 4693; Agrawal et al 
(1990) Proc. Natl. Acad. ScL USA 87, 1401-1405, the teachings of which 

20 are incorporated herein by reference. Other methods for synthesis or 
production also are possible. In a preferred embodiment the 
oligonucleotide is a deoxyribonucleic acid (DNA), although ribonucleic 
acid (RNA) sequences may also be synthesized and applied. 

25 The oligonucleotides useful in the invention preferably are designed to 
resist degradation by endogenous nucleolytic enzymes. In vivo 
degradation of oligonucleotides produces oligonucleotide breakdown 
products of reduced length. Such breakdown products are more likely to 
engage in non-specific hybridization and are less likely to be effective. 

30 relative to their full-length counterparts. Thus, it is desirable to use 
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oligonucleotides that are resistant to degradation in the body and which are 
able to reach the targeted cells. The present oligonucleotides can be 
rendered more resistant to degradation in vivo by substituting one or more 
internal artificial intemucleotide linkages for the native phosphodiester 

5 linkages, for example, by replacing phosphate with sulphur in the linkage. 
Examples of linkages that may be used include phosphorothioatcs, 
methylphosphonates, sulphone, sulphate, ketyl, phosphorodithioates, 
various phosphoramidates, phosphate esters, bridged phosphorothioatcs 
and bridged phosphoramidates. Such examples are illustrative, rather than 

10 limiting, since other intemucleotide linkages are known in the art. See, 
for example, Cohen, (1990) Trends in Biotechnology. The synthesis of 
oligonucleotides having one or more of these linkages substituted for the 
phosphodiester intemucleotide linkages is well known in the art, including 
synthetic pathways for producing oligonucleotides having mixed 

15 intemucleotide linkages. 

Oligonucleotides can be made resistant to extension by endogenous 
enzymes by capping" or incorporating similar groups on the 5' or 3' 
terminal nucleotides. A reagent for capping is commercially available as 
20 Amino-Link II™ from Applied BioSystems Inc, Foster City, CA. 
Methods for capping are described, for example, by Shaw et al (1991) 
Nucleic Acids Res. 19, 747-750 and Agrawal et al (1991) Proc. Natl. 
Acad. ScL USA 88(17), 7595-7599, the teachings of which are hereby 
incorporated herein by reference. 

25 

A further method of making oligonucleotides resistant to nuclease attack 
is for them to be "self-stabilized" as described by Tang et al (1993) Nucl. 
Acids Res. 21, 2729-2735 incorporated herein by reference. Self- 
stabilized oligonucleotides have hairpin loop structures at their 3' ends, 
30 and show increased resistance to degradation by snake venom 
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phosphodiesterase, DNA polymerase I and fetal bovine serum. The self- 
stabilized region of the oligonucleotide does not interfere in hybridization 
with complementary nucleic acids, and pharmacokinetic and stability 
studies in mice have shown increased in vivo persistence of self-stabilized 
5 oligonucleotides with respect to their linear counterparts. 

In accordance with the invention, the inherent binding specificity of 
antisense oligonucleotides characteristic of base pairing is enhanced by 
limiting the availability of the antisense compound to its intend locus in 

10 vivo, permitting lower dosages to be used and minimizing systemic effects. 
Thus, oligonucleotides are applied locally to achieve the desired effect. 
The concentration of the oligonucleotides at the desired locus is much 
higher than if the oligonucleotides were administered systemically, and the 
therapeutic effect can be achieved using a significantly lower total amount. 

15 The local high concentration of oligonucleotides enhances penetration of 
the targeted cells and effectively blocks translation of the target nucleic 
acid sequences. 

The oligonucleotides can be delivered to the locus by any means 
20 appropriate for localized administration of a drug. For example, a 
solution of the oligonucleotides can be injected directly to the site or can 
be delivered by infusion using an infusion pump. The oligonucleotides 
also can be incorporated into an implantable device which when placed at 
the desired site, permits the oligonucleotides to be released into the 
25 surrounding locus. 

The oligonucleotides are most preferably administered via a hydrogel 
material. The hydrogel is noninflammatory and biodegradable. Many 
such materials now are known, including those made from natural and 
30 synthetic polymers. In a preferred embodiment, the method exploits a 
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hydrogel which is liquid below body temperature but gels to form a shape- 
retaining semisolid hydrogel at or near body temperature. Preferred 
hydrogel are polymers of ethylene oxide-propylene oxide repeating units. 
The properties of the polymer are dependent on the molecular weight of 
5 the polymer and the relative percentage of polyethylene oxide and 
polypropylene oxide in the polymer. Preferred hydrogels contain from 
about 10 to about 80% by weight ethylene oxide and from about 20 to 
about 90% by weight propylene oxide. A particularly preferred hydrogel 
contains about 70% polyethylene oxide and 30% polypropylene oxide. 
10 Hydrogels which can be used are available, for example, from BASF 
Corp., Parsippany. NJ, under the tradename Pluronic**. 

In this embodiment, the hydrogel is cooled to a liquid state and the 
oligonucleotides are admixed into the liquid to a concentration of about 1 

15 mg oligonucleotide per gram of hydrogel. The resulting mixture then is 
applied onto the surface to be treated, for example by spraying or painting 
during surgery or using a catheter or endoscopic procedures. As the 
polymer warms, it solidifies to form a gel, and the oligonucleotides diffuse 
out of the gel into the surrounding cells over a period of time defined by 

20 the exact composition of the gel. 

The oligonucleotides can be administered by means of other implants that 
are commercially available or described in the scientific literature, 
including liposomes, microcapsules and implantable devices. For 

25 example, implants made of biodegradable materials such as 
polyanhydrides, polyorthoesters, polylactic acid and polyglycolic acid and 
copolymers thereof, collagen, and protein polymers, or non-biodegradable 
materials such as elhylenevinyl acetate (EVAc), polyvinyl acetate, ethylene 
vinyl alcohol, and derivatives thereof can be used to locally deliver the 

30 oligonucleotides. The oligonucleotides can be incorporated into the 
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material as it is polymerized or solidified, using melt or solvent 
evaporation techniques, or mechanically mixed with the material. In one 
embodiment, the oligonucleotides are mixed into or applied onto coatings 
for implantable devices such as dextran coated silica beads, stents, or 
5 catheters. 

The dose of oligonucleotides is dependent on the size of the 
oligonucleotides and the purpose for which is it administered. In general, 
the range is calculated based on the surface area of tissue to be treated. 
1 0 The effective dose of oligonucleotide is somewhat dependent on the length 
and chemical composition of the oligonucleotide but is generally in the 
range of about 30 to 3000 fig per square centimetre of tissue surface area. 

The oligonucleotides may be administered to the patient systemically for 
15 both therapeutic and prophylactic purposes. The oligonucleotides may be 
administered by any effective method, for example, parenterally (eg 
intravenously, subcutaneously, intramuscularly) or by oral, nasal or other 
means which permit the oligonucleotides to access and circulate in the 
patient's bloodstream. Oligonucleotides administered systemically 
20 preferably are given in addition to locally administered oligonucleotides, 
but also have utility in the absence of local administration. A dosage in 
the range of from about 0.1 to about 10 grams per administration to an 
adult human generally will be effective for this purpose. 

25 It will be appreciated that the molecules of this asj)ect of the invention are 
useful in treating or preventing any infection caused by the microorganism 
from which the said gene has been isolated, or a close relative of said 
microorganism. Thus, the said molecule is an antibiotic. 



30 



Thus, a twelfth aspect of the invention provides a molecule of the eleventh 
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aspect of the invention for use in medicine. 

A thirteenth aspect of the invention provides a method of treating a host 
which has, or is susceptible to, an infection with a microorganism, the 
5 method comprising administering an effective amount of a molecule 
according to the eleventh aspect of the invention wherein said gene is 
present in said microorganisms, or a close relative of said microorganism. 

By "effective amount" we mean an amount which substantially prevents 
10 or ameliorates the infection. By "host" we include any animal or plant 
which may be infected by a microorganism. 

It will be appreciated that pharmaceutical formulations of the molecule of 
the eleventh aspect of the invention form part of the invention. Such 
15 pharmaceutical formulations comprise the said molecule together with one 
or more acceptable carriers. The carrier(s) must be "acceptable" in the 
sense of being compatible with the said molecule of the invention and not 
deleterious to the recipients thereof. Typically, the carriers will be water 
or saline which will be sterile and pyrogen free. 

20 

As mentioned above, and as described in more detail in Example 4 below, 
I have found that certain virulence genes are clustered in Salmonella 
typhimurium in a region of the chromosome that I have called VGC2. 
DNA-DNA hybridisation experiments have determined that sequences 

25 homologous to at least part of VGC2 are found in many species and 
strains of Salmonella but are not present in the E, coli and Shigella strains 
tested (see Example 4). These sequences almost certainly correspond to 
conserved genes, at least in Salmonella, and at least some of which are 
virulence genes. It is believed that equivalent genes in other Salmonella 

30 species and, if present, equivalent genes in other enteric or other bacteria 
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Whether a gene within the VGC2 region is a virulence gene is readily 
determined. For example, those genes within VGC2 which have been 
5 identified by the method of the second aspect of the invention (when 
applied to Salmonella typhimurium and wherein the environment is an 
animal such as a mouse) arc virulence genes. Virulence genes are also 
identified by making a mutation in the gene (preferably a non-polar 
mutation) and determining whether the mutant strain is avirulent. 
10 Methods of making mutations in a selected gene are well known and are 
described below. 

A fourteenth asf)ect of the invention provides the VGC2 DNA of 
Salmonella typhimurium or a part thereof, or a variant of said DNA or a 
15 variant of a part thereof. 

The VGC2 DNA of Salmonella typhimurium is depicted diagrammatically 
in Figure 8 and is readily obtainable from Salmonella typhimurium ATCC 
14028 (available from the American Type Culture Collection, 12301 

20 Parklawn Drive, Rockville, Maryland 20852, USA; also deposited at the 
NCTC, Public Health Laboratory Service, Colindale, UK under accession 
no. NCTC 12021) using the information provided in Example 4. For 
example, probes derived from the sequences shown in Figures 11 and 12 
may be used to identify X clones from a Salmonella typhimurium genomic 

25 library. Standard genome walking methods can be employed to obtain all 
of the VGC2 DNA. The restriction map shown in Figure 8 can be used 
to identify and locate DNA fragments from VGC2. 

By "part of the VGC2 DNA of Salmonella typhimurium'*' we mean any 
30 DNA sequence which comprises at least 10 nucleotides, preferably at least 
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20 nucleotides, more preferably at least 50 nucleotides, still more 
preferably at least 100 nucleotides, and most preferably at least 500 
nucleotides of VGC2. A particularly preferred part of the VGC2 DNA 
is the sequence shown in Figure 11, or a part thereof. Another 
5 particularly preferred part of the VGC2 DNA is the sequence shown in 
Figure 12, or a part thereof. 

Advantageously, the part of the VGC2 DNA is a gene, or part thereof. 

10 Genes can be identified within the VGC2 region by statistical analysis of 
the open reading frames using computer programs known in the art. If an 
open reading frame is greater than about 100 codons it is likely to be a 
gene (although genes smaller than this are known). Whether an open 
reading frame corresponds to the polypeptide coding region of a gene can 

15 be determined experimentally. For example, a part of the DNA 
corresponding to the open reading frame may be used as a probe in a 
northern (RNA) blot to determine whether mRNA is expressed which 
hybridises to the said DNA; alternatively or additionally a mutation may 
be introduced into the open reading frame and the effect of the mutation 

20 on the phenotype of the microorganism can be determined. If the 
phenotype is changed then the open reading frame corresponds to a gene. 
Methods of identifying genes within a DNA sequence are known in the 
art. 

25 By "variant of said DNA or a variant of a part thereoP we include any 
variant as defined by the term "variant" in the seventh aspect of the 
invention. 

Thus, variants of VGC2 DNA of Salmonella typhimurium include 
30 equivalent genes, or parts thereof, from other Salmonella species, such as 
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Salmonella typhi and Salmonella emerica, as well as equivalent genes, or 
parts thereof, from other bacteria such as other enteric bacteria. 

By **equivalent gene" we include genes which are functionally equivalent 
5 and those in which a mutation leads to a similar phenotype (such as 
avirulence). It will be appreciated that before the present invention VGC2 
or the genes contained therein had not been identified and certainly not 
implicated in virulence determination, 

10 Thus, further aspects of the invention provide a mutant bacterium wherein 
if the bacterium normally contains a gene that is the same as or equivalent 
to a gene in VGC2, said gene is mutated or absent in said mutant 
bacterium; methods of making a mutant bacterium wherein if the 
bacterium normally contains a gene that is the same as or equivalent to a 

15 gene in VGC2, said gene is mutated or absent in said mutant bacterium. 
The following is a preferred method to inactivate a VGC2 gene. One first 
subclones the gene on a DNA fragment from a Salmonella X DNA library 
or other DNA library using a fragment of VGC2 as a probe in 
hybridisation experiments, and map the gene with respect to restriction 

20 enzyme sites and characterise the gene by DNA sequencing in Escherichia 
coli. Using restriction enzymes, one then introduces into the coding 
region of the gene a segment of DNA encoding resistance to an antibiotic 
(for example, kanamycin), possibly after deleting a portion of the coding 
region of the cloned gene by restriction enzymes. Methods and DNA 

25 constructs containing an antibiotic resistance marker are available to 
ensure that the inactivation of the gene of interest is preferably non-polar, 
that is to say, does not affect the expression of genes downstream from the 
gene of interest. The mutant version of the gene is then transferred from 
£. call to Salmonella typhimurium usiing phage P22 transduction and 

30 iransductants checked by Southern hybridisation for homologous 
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This approach is commonly used in Salmonella (and can be used in 5. 
typhi), and further details can be found in many papers, including Galan 
5 et al (1992) 174, 4338-4349. 

Still further aspects provide a use of said mutant mutant bacterium in a 
vaccine; pharmaceutical compositions comprising said bacterium and a 
pharmaceutically acceptable carrier; a polyp)eptide encoded by VGC2 

10 DNA of Salmonella typhimurium or a part thereof, or a variant of a part 
thereof; a method of identifying a compound which reduces the ability of 
a bacterium to infect or cause disease in a host; a compound identifiable 
by said method; a molecule which selectively interacts with, and 
substantially inhibits the function of, a gene in VGC2 or a nucleic product 

15 thereof; and medical uses and pharmaceutical compositions thereof. 

The VGC2 DNA contains genes which have been identified by the 
methods of the first and second aspects of the invention as well as genes 
which have been identified by their location (although identifiable by the 

20 methods of the first and second aspects of the invention). These further 
aspects of the invention relate closely to the fourth, fifth, sixth, seventh, 
eighth, ninth, tenth, eleventh, twelfth and thirteenth aspects of the 
invention and, accordingly, the information given in relation to those 
aspects, and preferences expressed in relation to those aspects, applies to 

25 these further aspects. 

It is preferred if the gene is from VGC2 or is an equivalent gene from 
another species of Salmonella such as 5. typhi . It is preferred if the 
mutant bacterium is a 5. typhimurium mutant or a mutant of another 
30 species of Salmonella such as 5. typhi. 
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It is believed that at least some of the genes in VGC2 confer the ability for 
the bacterium, such as 5. typhimuriimy to enter cells. 

The invention will now be described with reference to the following 
5 Examples and Figures wherein: 

Figure 1 illustrates diagrammatically one particularly preferred method of 
the invention. 

10 Figure 2 shows a Southern hybridisation analysis of DNA from 12 S. 
typhimurium exconjugants following digestion with EcdRV. The filter was 
probed with the kanamycin resistance gene of the mini-Tn5 transposon. 

Figure 3 shows a colony blot hybridisation analysis of DNA from 48 5. 
15 typhimurium exconjugants from a half of a microtitre dish (A1-H6). The 
filter was hybridised with a probe comprising labelled amplified tags from 
DNA isolated from a pool of the first 24 colonies (A1-D6). 

Figure 4 shows a DNA colony blot hybridisation analysis of 95 S. 

20 typhimurium exconjugants of a microtitre dish (Al-Hll), which were 
injected into a mouse. Replicate filters were hybridised with labelled 
amplified tags from the pool (inoculum pattern), or with labelled amplified 
tags from DNA isolated from over 10,000 pooled colonies that were 
recovered from the spleen of the infected animal (spleen pattern). 

25 Colonies B6, Al 1 and C8 gave rise to weak hybridisation signals on both 
sets of filters. Hybridisation signals from colonies A3, C5, G3 {aroA), 
and FIO are present on the inoculum pattern but not on the spleen pattern. 

Figure 5 shows the sequence of a Salmonella gene isolated using the 
30 method of the invention and a comparison to the Escherichia coli clp 
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protease genome. 

Figure 6 shows partial sequences of further Salmonella gene isolated using 
the method of the invention (SEQ ID Nos. 8 to 36). 

5 

Figure 7 shows the mapping of VGC2 on the S. typhimurium 
chromosome. (A) DNA probes from three regions of VGC2 were used 
in Southern hybridisation analysis of lysates from a set of 5. typhimurium 
strains harbouring locked in Mu^-P22 prophages. Lysates which 

10 hybridised to a 7.5 kb Pstl fragment (probe A in Figure 8) are shown. 
The other two probes used hybridised to the same lysates. (B) The 
insertion points and packaging directions of the phage are shown along 
with the map position in minutes (edition VIII, ref 22 in Example 4). The 
phage designations correspond to the following strains: 18P, TT15242; 

15 18Q, 15241; 19P, TT15244; 19Q, TT15243; 20P, TT15246 and 20Q, 
TT15245 (Ref in Example 4). The locations of mapped genes are shown 
by horizontal bars and the approximate locations of other genes are 
indicated. 

20 Figure 8 shows a physical and genetic map of VGC2. (A) The positions 
of 16 transposon insertions are shown above the line. The extent of 
VGC2 is indicated by the thicker line. The position and direction of 
transcription of ORFs described in the text of Example 4 are shown by 
arrows below the line, together with the names of similar genes, with the 

25 exception of ORFs 12 and 13 whose products are similar to the sensor and 
regulatory components respectively, of a variety of two component 
regulatory systems. (B) The location of overiapping clones and an 
EcdidlXbal restriction fragment from Mu^/-P22 prophage strain TT15244 
are shown as filled bars. Only the portions of the X clones which have 

30 been mapped are shown and the clones may extend beyond these limits. 
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(C) The positions of restriction sites are marked: B, Bamlil; E, EcoRl; 
V, EcoRV; H, HindlU; P, Pstl and X, Xbal. The positions of the 7,5 kb 
Psil fragment (probe A) used as a probe in Figure 7 and that of the 2.2 
kb PstlfHindlll fragment (probe B) used as a probe in Figure 10 are 
5 shown below the restriction map. The positions of Sequence 1 (described 
in Figure 1 1) and Sequence 2 (described in Figure 12) are shown by the 
thin arrows (labelled Sequence 1 and Sequence 2). 

Figure 9 describes mapping the boundaries of VGC2. (A) The positions 
10 of mapped genes at minutes 37 to 38 on the £. coli K12 chromosome are 
aligned with the corresponding region of the S. typhimurium LT2 
chromosome (minutes 30 to 31). An expanded map of the VGC2 region 
is shown with 11 5. typhimurium (5. /.) DNA fragments used as probes 
(thick bars) and the restriction sites used to generate them: B, BamUl; C, 
15 C/fll; H, Hindll; K, Kpnl; P, Psil; N, Nsil and S, Sail. Probes that 
hybridised to E. coli K12 (£. c.) genomic DNA are indicated by +; those 
which failed to hybridise are indicated by 

Figure 10 shows that VGC2 is conserved among and specific to the 
20 Salmonellae, Genomic DNA from Salmonella serovars and other 
pathogenic bacteria was restricted with Pstl (A), Hindlll or EcoRV (B) 
and subjected to Southern hybridisation analysis, using a 2.2 kb 
Pstl/Hindlll fragment from X clone 7 as a probe (probe B Figure 2). The 
filters were hybridised and washed under stringent (A) or non-stringent 
25 (B) conditions. 

Figure 1 1 shows the DNA sequence of **Sequence 1 " of VGC2 from the 
centre to the left-hand end (see ihe arrow labelled Sequence 1 in Figure 
2). The DNA is translated in all six reading frames and the start and stop 
30 positions of putative genes, and the transposon insertion positions for 
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various mutants identified by STM are indicated (SEQ ID No 37). 



As is conventional a * indicates a stop codon and standard nucleotide 
ambiguity codes are used where necessary. 

5 

Figure 12 shows the DNA sequence of "Sequence T of VGC2 (cluster C) 
(see the arrow labelled Sequence 2 in Figure 2). The DNA is translated 
in all six reading frames and the start and stop positions of putative genes, 
and the transposon insertion positions for various mutants identified by 
10 STM are indicated (SEQ ID No 38). 

As is conventional a * indicates a stop codon and standard nucleotide 
ambiguity codes are used where necessary. 

15 Figures 7 to 12 are most relevant to Example 4. 

Example 1: Identification of virulence genes in SalmoneUa 
tvphimurium 

20 Materials and Methods 

Bacterial Strains and Plasmids 

Salmonella typhimurium strain 12023 (equivalent to American Type 
25 Culture Collection (ATCC) strain 14028) was obtained from the National 
Collection of Type Cultures G^CTC), Public Health Laboratory Service, 
Colindale, London, UK. A spontaneous nalidixic acid resistant mutant of 
this strain (12023 NalO was selected in our laboratory. Another derivative 
of strain 12023, CL1509 (aroAwTnlO) was a gift from Fred Heffron. 
30 Escherichia coli strains CC118 \pir {^[ara^leu], araD, AlacX74, galE. 
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galK, phoA20, thi-1, rpsE, rpoB, argE(Am), recAl, }^ir phzgc lysogen) 
andS17-l )^ir(Jp\ Sm\recA, thi.pro, hsdRM^ , RP4;2-Tc:Mu:KniTn7, 
) were gifts from Kenneth Timmis. E. coli DH5a was used for 
propagating pUC18 (Gibco-BRL) and Bluescript (Stratagene) plasmids 
5 containing S, typhimurium DNA. Plasmid pUTmini-Tn5Km2 (de Lorenzo 
et al, 1990) was a gift from Kenneth Timmis. 

Construction of semi-random sequence tags and ligations 

10 The oligonucleotide pool RT1(5'-CTAGGTACCTACAACCTCAAGCTT- 
[NK]2o-AAGCTTGGTTAGAATGGGTACCATG-3') (SEQ ID No 1), and 
primers P2 (5'-TACCTACAACCTCAAGCT-3') (SEQ ID No 2), P3 (5'- 
CATGGTACCCATTCTAAC-3') (SEQ ID No 3), P4 (5'- 
TACCCATTCTAACCAAGC-3') (SEQ ID No 4) and P5 (5'- 

15 CTAGGTACCTACAACCTC-3') (SEQ ID No 5) were synthesized on a 
oligonucleotide synthesizer (Applied Biosystems, model 380B). Double 
stranded DNA tags were prepared from RTl in a 100 ^l\ volume PGR 
containing 1.5 mM MgCU 50 mM KCl, and 10 mM Tris-Cl (pH 8.0) with 200 
pg of RTl as target; 250 mM each dATP, dCTP, dGTP, dTTP; 100 pM of 

20 primers P3 and P5; and 2.5 U of Amplitaq (Perkin-Elmer Cetus). Thermal 
cycling conditions were 30 cycles of 95*'C for 30 s, 50''C for 45 s, and 72^C 
for 10 s. The PGR product was gel purified (Sambrook et aL 1989), passed 
through an elutipD column (available from Schleicher and Schull) and digested 
with Kpnl prior to ligation into pUCI8 or pUTmini-Tn5Km2. For ligations, 

25 plasmids were digested with Kpnl and dephosphorylated with calf intestinal 
alkaline phosphatase (Gibco-BRL). Linearized plasmid molecules were gel- 
purified (Sambrook et al, 1989) prior to ligation to remove any residual uncut 
plasmid DNA from the digestion. Ligation reactions contained approximately 
50 ng each of plasmid and double stranded tag DNA in a 25 ^1 volume with 1 

30 unit T4 DNA ligase (Gibco-BRL) in a buffer supplied with the enzyme. 
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Ligations were carried out for 2 h at 24 ''C. To determine the proportion of 
bacterial colonies arising from either self ligation of the plasmid DN A or uncut 
plasmid DNA, a control reaction was carried out in which the double stranded 
tag DNA was omitted from the ligation reaction. This yielded no ampicillin 
5 resistant bacterial colonies following transformation of E. coli CC118 
(Sambrook et al, 1989), compared with 185 colonies arising from a ligation 
reaction containing the double stranded tag DNA. 

Bacterial Transformaiion and Matings 

10 

The products of several ligations between pUT mini-Tn5Km2 and the 
double stranded tag DNA were used to transform E, coli CC118 (Sambrook 
et al, 1989). A total of approximately 10,300 transformants were pooled 
and plasmid DNA extracted from the pool was used to transform E. coli S- 

15 17 (de Lorenzo & Timmis, 1994). For mating experiments, a pool of 
approximately 40,000 ampicillin resistant E. coli S-17 >^ir transformants, 
and S. typhimurium 12023 Nal' were cultured separately to an optical 
density (OD),^ of 1.0. Aliquots of each culture (0.4 ml) were mixed in 5 
ml 10 mM MgS04^ and filtered through a Millipore membrane (0.45 /xm 

20 diameter). The filters were placed on the surface of agar containing M9 
salts (de Lorenzo & Timmis, 1994) and incubated at 37*'C for 16 h. The 
bacteria were recovered by shaking the filters in liquid LB medium for 40 
min at 37''C and exconjugants were selected by plating the suspension onto 
LB medium containing 100 /xg ml*' nalidixic acid (to select against the donor 

25 strain) and 50 ^g ml'' kanamycin (to select for the recipient strain). Each 
exconjugant was checked by transferring nalidixic acid resistant (nalO, 
kanamycin resistant (kanO colonies to MacConkey Lactose indicator medium 
(to distinguish between £. coli and 5. typhimurium), and to LB medium 
containing ampicillin. Approximately 90% of the nal", kan' colonies were 

30 sensitive to ampicillin, indicating that these resulted from authentic 
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transposition events (de Lorenzo & Timmis, 1994), Individual ampicillin- 
sensitive exconjugants were stored in 96 well microtitre dishes containing 
LB medium. For long term storage at '80**C, either 7% DMSO or 15% 
glycerol was included in the medium. 

5 

Phenotypic characterisation ofmutarus 

Mutants were replica plated from microtitre dishes onto solid medium 
containing M9 salts and 0.4% glucose (Sambrook et al, 1989) to identify 
10 auxotrophs. Mutants with rough colony morphology were detected by low 
magnification microscopy of colonies on agar plates. 

Colony Blois, DNA extractions, PCRs, DNA labelings and hybridisations 

15 For colony blot hybridizations, a 48-well metal replicator (Sigma) was used 
to transfer exconjugants from microtitre dishes to Hybond N nylon filters 
(Amersham, UK) that had been placed on the surface of LB agar containing 
50 fig ml ' kanamycin. After overnight incubation at 37**C, the filters 
supporting the bacterial colonies were removed and dried at room 

20 temperature for 10 min. The bacteria were lysed with 0.4 N NaOH and the 
filters washed with 0.5 N Tris-Cl pH 7.0 according to the filter 
manufacturer's instructions. The bacterial DNA was fixed to the filters by 
exposure to UV light from a Stratalinker (Stratagene). Hybridisations to 
^P-labelled probes were carried out under stringent conditions as previously 

25 described (Holden et aU 1989). For DNA extractions, 5. typhimurium 
transposon mutant strains were grown in liquid LB medium in microtitre 
dishes or resuspended in LB medium following growth on solid media. 
Total DNA was prepared by the hexadecyltrimethylammoniumbromide 
(CTAB) method according to Ausubel et al (1987). Briefly, cells from 150 

30 to 1000 p\ volumes were precipitated by centrifugation and resuspended in 
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576 nl TE. To this was added 15 ^1 of 20% SDS and 3 fd of 20 mg ml ' 
proteinase K. After incubating at 37'C for 1 hour, 166 /il of 3 M NaCl 
was added and mixed thoroughly, followed by 80 /xl of 10% (w/v) CTAB 
and 0.7 M NaCl. After thorough mixing, the solution was incubated at 
5 65''C for 10 min. Following extraction with phenol and phenol-chloroform, 
the DNA was precipitated by addition of isopropanol, washed with 70% 
ethanol and resuspended in TE at a concentration of approximately 1 ng 

Ml'. 

10 The DNA samples were subjected to two rounds of PCR to generate 
labelled probes. The first PCR was performed in IGO /il reactions 
containing 20 mM Tris-Cl pH 8.3; 50 mM KCI; 2 mM MgClj; 0.01% 
Tween 80; 200 ;xM each dATP, dCTP, dGTP, dTTP; 2.5 units of Amplitaq 
polymerase (Pericin-Elmer Cetus); 770 ng each primer P2 and P4; and 5 fig 

15 target DNA. After an initial denaturation of 4 min at 95°C, thermal cycling 
consisted of 20 cycles of 45 s at 50»C, 10 s at 72'C, and 30 s at 95''C. 
PCR products were extracted with chloroform/isoamyl alcohol (24/1) and 
precipitated with ethanol. DNA was resuspended in 10 /xl TE and the PCR 
products were purified by electrophoresis through a 1.6% Seaplaque (FMC 

20 Bioproducts) gel in TAE buffer. Gel slices containing fragments of about 
80 bp were excised and used for the second PCR. This reaction was 
carried out in a 20 fi\ total volume, and contained 20 mM Tris-Cl pH 8.3; 
50 mM KCI; 2 mM MgCl,; 0.01% Tween 80; 50 /xM each dATP, dTTP, 
dGTP; 10 /il ^P-dCTP (3000 Ci/mmol, Amersham); 150 ng each primer P2 

25 and P4; approximately 10 ng of target DNA (1-2 ftl of 1.6% Seaplaque 
agarose containing the first round PCR product); 0.5 units of Amplitaq 
polymerase. The reaction was overlayed with 20 nl mineral oil and thermal 
cycling was performed as described above. Incorporation of the radioactive 
label was quantitated by absorbance to Whatman DE81 paper (Sambrook et 

30 al, 1989). 
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Infection Studies 

Individual Salmonella exconjugants containing tagged transposons were 
grown in 2% tryptone, 1% yeast extract, 0.92% v/v glycerol, 0.5% 
5 Na2P04, 1 % KNOj (TYGPN medium) (Ausubel et al, 1987) in microtitre 
plates overnight at 37**C. A metal replicator was used to transfer a small 
volume of the overnight cultures to a fresh microtitre plate and the cultures 
were incubated at 37*C until the OD5B0 (measured using a Titertek 
Multiscan microtitre plate reader) was approximately 0.2 in each well. 
10 Cultures from individual wells were then pooled and the OD350 determined 
using a spectrophotometer. The culture was diluted in sterile saline to 
approximately 5x10^ cfii ml '. Further dilutions were plated out onto 
TYGPN containing nalidixic acid (100 mg ml ') and kanamycin (50 mg ml *) 
to confirm the cfu present in the inoculum. 

15 

Groups of three female BALB/c mice (20-25g) were injected 
intraperitoneally with 0.2 ml of bacterial suspension containing 
approximately IxlO' cfu ml ^ Mice were sacrificed three days post- 
inoculation and their spleens were removed to recover bacteria. Half of 

20 each spleen was homogenized in 1 ml of sterile saline in a microftige tube. 
Cellular debris was allowed to settle and 1 ml of saline containing cells still 
in suspension was removed to a fresh tube and centrifuged for two minutes 
in a microfiige. The supernatant was aspirated and the pellet resuspended 
in 1 ml of sterile distilled water. A dilution series was made in sterile 

25 distilled water and 1(X) ml of each dilution was plated onto TYGPN agar 
containing nalidixic acid (100 ug ml ') and kanamycin (50 ug ml '). Bacteria 
were recovered from plates containing between 1000 and 4000 colonies, and 
a total of over 10,000 colonies recovered from each spleen were pooled and 
used to prepare DNA for PCR generation of probes to screen colony blots. 

30 

RKHHED SHEET (RULE 91) 



wo 96/17951 PCT/GB95/02875 

52 

Virulence gene cloning and DNA sequencing 

Total DNA was isolated from 5. typhimurium exconjugants and digested 
separately with Sstl, Sail, Pstl and Sphl. Digests were fractionated through 

5 agarose gels, transferred to Hybond membranes (Amersham) and 
subjected to Southern hybridisation analysis using the kanamycin resistance 
gene of pUT mini-Ta5Km2 as a probe. The probe was labelled with 
digoxygenin (Boehringer-Mannheim) and chemiluminescence detection was 
carried out according to the manufacturer's instructions. The hybridisation 

1 0 and washing conditions were as described above. Restriction enzymes which 
gave rise to hybridising fragments in the 3-5 kb range were used to digest 
DNA for a preparative agarose gel, and DNA fragments corresponding to 
the sizes of the hybridisation signals were excised from this, purified and 
ligated into pUC18. Ligation reactions were used to transform E. coli 

15 DH5a to kanamycin resistance. Plasmids from kanamycin-resistant 
transformants were purified by passage through an elutipD column and 
checked by restriction enzyme digestion. Plasmid inserts were partially 
sequenced by the di-deoxy method (Sanger et al, 1977) using the -40 primer 
and reverse sequencing primer (United States Biochemical Corporation) and 

20 the primers P6 (5'-CCTAGGCGGCCAGATCTGAT-3') (SEQ ID No 6) and 
P7 (5'GCACTTGTGTATAAGAGTCAG-30 (SEQ ID No 7) which anneal 
to the I and O termini of TnJ, respectively. Nucleotide sequences and 
deduced amino acid sequences were assembled using the Macvector 3.5 
software package run on a Macintosh SE/30 computer. Sequences were 

25 compared with the EMBL and Genbank DNA databases using the 
UNIX/SUN computer system at the Human Genome Mapping Project 
Resource Centre, Harrow, UK. 
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Results 
Tag Design 

5 The structure of the DNA tags is shown in Figure la. Each tag consists of 
a variable central region flanked by "arms", of invariant sequence. The 
central region sequence ([NKJjo) was designed to prevent the occurrence of 
sites for the commonly used 6 bp-recognition restriction enzymes, but is 
sufficiently variable to ensure that statistically, the same sequence should 

10 only occur once in 2 x 10'^ molecules (DNA sequencing of 12 randomly 
selected tags showed that none shared more than 50% identity over the 
variable region). (N means any base (A, G, C or T) and K means G or T.) 
The arms contain Kpnl sites close to the ends to facilitate the initial cloning 
step, and the HindlU sites bordering the variable region were used to release 

15 radiolabelled variable regions from the arms prior to hybridisation analysis* 
The arms were also designed such that primers P2 and P4 each contain only 
one guanine residue. Therefore during a PGR using these primers, only one 
cytosine will be incorporated into each newly synthesised arm, compared to 
an average of ten in the unique sequence. When radiolabelled dCTP is 

20 included in the PGR, an average of ten-fold more label will be present in 
the unique sequence compared with each arm. This is intended to minimise 
background hybridisation signals from the arms, after they have been 
released from the unique sequences by digestion with Hindlll. Double 
stranded tags were ligated into the Kpnl site of the mini-Tn5 transi)oson 

25 Km2, carried on plasmid pUT (de Lorenzo & Timmis, 1994). Replication 
of this plasmid is dependent on the R6K-specified r product of the p/rgene. 
It carries the oriT sequence of the RP4 plasmid, permitting transfer to a 
variety of bacterial species (Miller & Mekalanos, 1988), and the mp' gene 
needed for transposition of the mini-TnJ element. The tagged mini-TnJ 

30 transposons were transferred to 5. ryphimurium by conjugation, and 288 
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cxconjugants resulting from transposition events were stored in the wells of 
microtitre dishes. Total DNA isolated from 12 of these was digested with 
EcoRV, and subjected to Southern hybridisation analysis using the 
kanamycin resistance gene of the mini-Tn5 iransposon as a probe. In each 
5 case, the exconjugant had arisen as a result a single integration of the 
transposon into a different site of the bacterial genome (Figure 2). 

Specificity and sensitivity studies 

10 We next determined the efficiency and uniformity of amplification of the 
DNA tags in PCRs involving pools of exconjugant DNAs as targets for the 
reactions. In an attempt to minimise unequal amplification of tags in the 
PGR, we determined the maximum quantity of DNA target that could be 
used in a 100 /il reaction, and the minimum number of PGR cycles, that 

15 resulted in products which could be visualised by ethidium bromide staining 
of an agarose gel (5 fig DNA and 20 cycles, respectively). 

5. typhimurium cxconjugants which had reached stationary growth phase in 
microtitre dishes were combined, and used to extract DNA. This was 

20 subjected to a PGR using primers P2 and P4. PGR products of 80 bp were 
gel-purified and used as targets for a second PGR, using the same primers 
but with ^^P-labelled GTP. This resulted in over 60% of the radiolabelled 
dGTP being incorporated into the PGR products. The radiolabelled 
products were digested with HindlU and used to probe colony blotted DNA 

25 from their corresponding microtitre dishes. Of the 1510 mutants tested in 
this way, 358 failed to yield a clear signal on an autoradiogram following 
an overnight exposure of the colony blot. There are three potential 
explanations for this. Firstly, it is possible that a proportion of the 
transposons did not carry tags. However, by comparing the transformation 

30 frequencies resulting from ligation reactions involving the iransposon in the 
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presence or absence of tags, it seems unlikely that untagged transposons 
could account for more than approximately 0.5% of the total (see Materials 
and Methods). More probable causes are that the variable sequence was 
truncated in some of the tags, and/or that some of the sequences formed 
5 secondary structures, both of which might have prevented amplification. 
Mutants which failed to give clear signals were not included in further 
studies. The specificity of the efficiendy amplifiable tags was demonstrated 
by generating a probe from 24 colonies of a microtitre dish, and using it to 
probe a colony blot of 48 colonies, which included the 24 used to generate 
10 the probe. The lack of any hybridisation signal from the 24 colonies not 
used to generate the probe (Figure 3) shows that the hybridisation conditions 
employed were sufficiently stringent to prevent cross-hybridisation among 
labelled tags, and suggests that each exconjugant is not reiterated within a 
microtitre dish. 

15 

There are further considcrauons in determining the maximum pool size that 
can be used as an inoculum in animal experiments. As the quantity of 
labelled tag for each transposon is inversely proportional to the complexity 
of the tag pooK there is a limit to the pool size above which hybridisation 

20 signals become loo weak to be detected after overnight exposure of an 
autoradiogram. More importantly, as the complexity of the pool increases, 
so must the likelihood of failure of a virulent representative of the pool to 
be present in sufficient numbers, in the spleen of an infected animal, to 
produce enough labelled probe. We have not determined the upper limit for 

25 pool size in the murine model of salmonellosis that we have employed, but 
it must be in excess of 96. 

Virulence tests of the transposon mutants 



30 A lota! of 1152 uniquely lagged insertion mutants (from two microtitre 
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dishes) were tested for virulence in BALB/c mice in twelve pools, each 
representing a 96-well microtitre dish. Animals received an intraperitoneal 
injection of approximately l(P cells of each of 96 transposon mutants of a 
microtitre dish (lO' organisms in total). Three days after injection mice 

5 were sacrificed, and bacteria were recovered by plating spleen homogenates 
onto laboratory medium. Approximately 10,000 colonies recovered from 
each mouse were pooled and DNA was extracted. The tags present in this 
DNA sample were amplified and labelled by the PGR, and colony blots 
probed and compared with the hybridisation pattern obtained using tags 

10 amplified from the inoculum (Figure 3). As a control, an aroA mutant of 
5. typhimurium was tagged and employed as one of the 96 mutants in the 
inoculum. This strain would not be expected to be recovered in the spleen 
because its virulence is severely attenuated (Buchmeier et al, 1993), Forty- 
one mutants were identified whose DNA hybridized to labelled tags from 

15 the inoculum but not from labelled tags from bacteria recovered from the 
spleen. The experiment was repeated and the same forty-one mutants were 
again identified. Two of these were the aroA mutant (one per pool), as 
expected. Another was an auxotrophic mutant (it failed to grow on minimal 
medium). All of the mutants had normal colony morphology. 



20 



Example 2: Cloning and partial characterisation of se quences flanking 
the transposon 



DNA was extracted from one of the mutants described in Example 1 (Pool 
25 L FIO), digested with Ssth and subcloned on the basis of kanamycin 
resistance. The sequence of 450 bp flanking one end of the transposon was 
determined using primer P7. This sequence shows 80% identity to the £. 
coli clp (Ion) gene, which encodes a heat-regulated protease (Figure 5). To 
our knowledge, this gene has not previously been implicated as a virulence 
30 determinant. 



wo 96/17951 PCr/GB95A)2875 

57 

Partial sequences of thirteen further Salmonella typhimuriwn virulence genes 
arc shown in Figure 6 (sequences A2 to A9 and Bl to B5). Deduced amino 
acid sequences of P2D6, S4C3, P3F4, F7G2 and P9B7 bear similarities to 
a family of secretion-associated proteins that have been conserved 

5 throughout bacterial pathogens of animals and plants, and which are known 
in Salmonella as the inv family. In S. typhimurium the inv genes are 
required for bacterial invasion into intestinal tissue. The virulence of inv 
mutants is attenuated when they are inoculated by the oral route, but not 
when they arc administered intraperitoneal ly. The discovery of inv-rclated 

10 genes that arc required for virulence following intraperitoneal inoculation 
suggests a new secretion apparatus which might be required for invasion of 
non-phagocytic cells of the spleen and other organs. The products of these 
new genes might represent better drug targets than the inv proteins in the 
treatment of established infections. 

15 

Further characterisation of the genes identified in this example is described 
in Example 4. 

Example 3: LD e ^ determinations and mouse vaccination study 

20 

Mutations identified by the method of the invention attenuate virulence. 

Five of the mutations in genes not previously implicated in virulence were 
transferred by P22-mediated transduction to the nalidixic acid-sensitive 

25 parent strain of S. typhimurium 12028. Transductants were checked by 
restriction mapping then injected by the intraperitoneal route into groups of 
BALB/c mice to determine their 50% lethal dose (LDjo). The LD50 values 
for mutants S4C3, P7G2. P3F4 and P9B7 were all several orders of 
magnitude higher than that of the wild-type strain. No difference in the 

30 LD50 was detected for mutant PI FIG: however, there was a statistically 
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significant decrease in the proportion of PIFIO cells recovered from the 
spleens of mice injected with an inoculum consisting of an equal proportion 
of this strain and the wild-type strain. This implies that this mutation does 
attenuate virulence, but to a degree that is not detectable by LDy,- 

5 

Mutants P3F4 and P9B7 were also administered by the oral route at an 
inoculum level of 10^ cells/mouse. None of the mice became ill, indicating 
that the oral LD^o levels of these muUnts are at least an order of magnitude 
higher than that of the wild-type strain. 

10 

In the mouse vaccination study groups of five female BALB/c mice of 20-25 
g in mass were initially inoculated orally (p.o.) or intraperitoneally (i.p.) 
with serial ten fold dilutions of Salmonella typhimurium mutant strains P3F4 
and P9B7. After four weeks the mice were then inoculated with 500 c.f.u. 
15 of the parental wild type strain. Deaths were then recorded over four 
weeks. 

A group of two mice of the same age and batch as the mice inoculated with 
the mutant strains were also inoculated i.p. with 500 c.f.u. of the wild type 
20 strain as a positive control. Both non-immunised mice died as expected 
within four weeks. 

Results are tabulated below: 

25 1) p.o. initial inoculation with mutant strain P3F4 



initial inoculum in 
cf.u. 


no. mice surviving 
first challenge 


no. mice surviving 
wild type challenge 


5 X 10' 


5 


2 (40%) 


5 X 10» 


5 


2 (40%) 
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|5 X 10' 


5 


0 (0%) 1 


2) i.p. initial inoculum with mutant strain P3F4 


initial inoculum in 
c.f.u. 


no. mice surviving 
first challenge 


no. mice surviving | 
wild type challenge 


5 X 10* 


3 


3 (100%) 


5x 10» 


5 


4 (80%) 


5 X 10* 


6 


5 (83%) 


5X 10* 


5 


4 (80%) 


3) p.o. initial inoculum with mutant strain P9B7 


initial inoculum in 
cf.u. 


no. mice surviving 
first challenge 


no. mice surviving 
wild type challenge 


5 X 10' 


5 


0(0%) 



4) i.p. initial inoculum with mutant P9B7 



20 


initial inoculum in 


no. mice surviving 


no. mice surviving 1 




c.f.u« 


first challenge 


wild type challenge 




5 X 10^ 


4 


2 (50%) 



From these experiments I conclude that mutant P3P4 appears to give some 
25 protection against subsequent wild type challenge. This protection appears 
greater in mice that were immunised i.p. 



Example 4; Identification of a virulence locus encoding a second type 
in secretion system in Salmonella tvvhimurium 

30 

Abbreviations used in this Example are VGCl, virulence gene cluster 1; 
VGC2, virulence gene cluster 2. 
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Salmonella ryphimurium is a principal agent of gastroenteritis in humans and 
produces a systemic illness in mice which serves as a model for human 

5 typhoid fever (1). Following oral inoculation of mice with 5. ryphimurium, 
the bacteria pass from the lumen of the small intestine through the intestinal 
mucosa, via enterocytes or M cells of the Peyer's patch follicles (2). The 
bacteria then invade macrophages and neutrophils, enter the 
reticuloendothelial system and disseminate to other organs, including the 

10 spleen and liver, where further reproduction results in an overwhelming and 
fatal bacteremia (3). To invade host cells, to survive and replicate in a 
variety of physiologically stressful intracellular and extracellular 
environments and to circumvent the specific antibacterial activities of the 
immune system, 5. ryphimurium employs a sophisticated repertoire of 

15 virulence factors (4). 

To gain a more comprehensive understanding of virulence mechanisms of 
5. ryphimurium and other pathogens the transposon mutagenesis system 
described in Example 1, which is conveniently called *signature-tagged 

20 mutagenesis' (STM), which combines the strength of mutational analysis 
with the ability to follow simultaneously the fate of a large number of 
different mutants within a single animal (5 and Example 1 ; Reference 5 was 
published after the priority date for this invention). Using this approach we 
identified 43 mutants with attenuated virulence from a total of 1 152 mutants 

25 that were screened. The nucleotide sequences of DNA flanking the 
insertion points of iransposons in 5 of these mutants showed that they were 
related to genes encoding type 111 secretion systems of a variety of bacterial 
pathogens (6, 7). The products of the inv/spa gene cluster of 5. 
ryphimurium (8, 9) are proteins that form a type III secretion system 

30 required for the assembly of surface appendages mediating entry into 



wo 96/17951 PCT/GB95rt)2«75 

61 

epithelial cells (10). Hence the virulence of strains carrying mutations in 
the inv/spa cluster is attenuated only if the inoculum is administered orally 
and not when given intraperitoneally (8). In contrast the 5 mutants 
identified by STM are avirulent following intraperitoneal inoculation (5). 

5 

In this example we show that the transjK>son insertion points of these 5 
mutants and an additional 1 1 mutants identified by STM all map to the same 
region of the 5. typhimurium chromosome. Further analysis of this region 
reveals additional genes whose deduced products have sequence similarity 
10 to other components of type III secretion systems. This chromosomal 
region which wc refer to as virulence gene cluster 2 (VGC2) is not present 
in a number of other enteric bacteria, and represents an important locus for 
S. Typhimurium virulence. 

15 Materials and Methods 

Bacterial Strains, Transduction and Growth Media. Salmonella enterica 
serotypes 5791 (aberdeen), 423180 {gallinarum), 7101 {cubana) and 12416 
{typhimurium LT2) were obtained from the National Collections of Type 

20 Cultures, Public Health Laboratory Service, UK. Salmonella typhi BRD123 
genomic DNA was a gift from G. Dougan, enteropathogenic Escherichia 
coll (EPEC), enterohemorrhagic E, coli (EHEC), Vibrio cholera biotype El 
Tor, Shigella flexneri serotype 2 and Staphylococcus aureus were clinical 
isolates obtained from the Department of Infectious Diseases and 

25 Bacteriology, Royal Postgraduate Medical School, UK. Genomic DNA 
from Yersinia pestis was a gift from J. Heesemann. However, genomic 
DNA can be isolated using standard methods. The bacterial strains and the 
methods used to generate signature-tagged mini-Tni transposon mutants of 
5. typhimurium NCTC strain 12023 have been described previously (5, 11). 

30 Routine propagation of plasmids was in E. coli DH5a. Bacteria were 
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grown in LB broth (12) supplemented with the appropriate antibiotics. 
Before virulence levels of individual ntiutant strains were assessed, the 
mutations were first transferred by phage P22 mediated transduction (12) to 
the nalidixic acid sensitive parental strain of S. typhimurium 12023. 
5 Transducunts were analysed by restriction digestion and Southern 
hybridisation before use as inoculum. 



Lambda Library Screening. Lambda (X) clones with overlapping insert 
DNAs covering VGC2 were obtained by standard methods (13) from a 
10 X1059 library (14) containing inserts from a partial Sau3A digest of S. 
typhimurium LT2 genomic DNA. The library was obtained via K. 
Sanderson, from the Salmonella Genetic Stock Centre (SGSC), Calgary, 
Canada. 



15 Miuf-P22 Lysogens. Radiolabeled DNA probes were hybridised to 
Hybond N (Amersham) filters bearing DNA prepared from lysates of a set 
of S. typhimurium strains harbouring Mu</-P22 prophages at known 
positions in the 5. typhimurium genome. Preparation of mitomycin-induced 
M\id-?22 lysates was as described (12, 15). The set of Mui/-P22 prophages 

20 was originally assembled by Benson and Goldman (16) and was obtained 
from the SGSC. 

Gel Electrophoresis and Southern Hybridisation. Gel electrophoresis was 
performed in 1 % or 0.6% agarose gels run in 0.5 x THE. Gel fractionated 

25 DNA was transferred to Hybond N or N+ membranes (Amersham) and 
stringent hybridisation and washing procedures (permitting hybridisation 
between nucleotide sequences with 10% or less mismatches) were as 
described by Holden ei al, (17). For non-stringent conditions (permitting 
hybridisation between sequences with 50% mismatches) filters were 

30 hybridised overnight at 42°C in 1 0% formamide/0.25 M Na,HP04/7 % SDS 
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and the most stringent step was with 20 mM Na2HP04/l% SDS at 42^C. 
DNA fragments used as probes were labelled with pPJdCTP using the 
*Radprime' system (Gibco-BRL) or with [digoxigenin-1 l]dUTP and detected 
using the Digoxigenin system (Boehringer Mannheim) according to the 
5 manufacturers' instructions, except that hybridisation was performed in the 
same solution as that used for radioactively labelled probes. Genomic DNA 
was prepared for Southern hybridisation as described previously (13). 



Molecular Cloning and Nucleotide Sequencing , Restriction endonucleases 
10 and T4 DNA ligase were obtained from Gibco-BRL. General molecular 
biology techniques were as described in Sambrook et al, (18). Nucleotide 
sequencing was performed by the dideoxy chain termination method (19) 
using a T7 sequencing kit (Pharmacia). Sequences were assembled with the 
Mac Vector 3.5 software or AssemblyLIGN packages. Nucleotide and 
15 derived amino acid sequences were compared with those in the European 
Molecular Biology Laboratory (EMBL) and SwissProt databases using the 
BLAST and FASTA programs of the GCG package from the University of 
Wisconsin (version 8) (20) on the network service at the Human Genome 
Mapping Project Resource Centre, Hinxton, UK. 

20 

Virulence Tests. Groups of five female BALB/c mice (20-25g) were 
inoculated orally (p.o.) or intraperitoneally (i.p,) with 10- fold dilutions of 
bacteria suspended in physiological saline. For preparation of the inoculum, 
bacteria were grown overnight at 37*^0 in LB broth with shaking (50 rpm) 

25 and then used to inoculate fresh medium for various lengths of time until an 
optical density (OD) at 560 nm of 0.4 to 0.6 had been reached. For cell 
densities of 5 x 10* colony forming units (cfu) per ml and above, cultures 
were concentrated by cenirifugation and resuspended in saline. The 
concentration of cfu/ml was checked by plating a dilution series of the 

30 inoculum onto LB agar plates. Mice were inoculated i.p. with 0.2 mi 
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volumes and p.o. by gavage with the same volume of inoculum. The LDy, 
values were calculated after 28 days by the method of Reed and Meunch 
(21). 

5 Results 

Localisation of Transposon Insertions. The generation of a bank of 
Salmonella typhimurium mini Tn5 transposon mutants and the screen used 
to identify 43 mutants with attenuated virulence have been described 

10 previously (5). Transposons and flanking DNA regions were cloned from 
exconjugants by selection for kanamycin resistance or by inverse PGR. 
Nucleotide sequences of 300-600 bp of DNA flanking the transposons were 
obtained for 33 mutants. Comparison of these sequences with those in the 
DNA and protein databases indicated that 14 mutants resulted from 

15 transposon insertions into previously known virulence genes, 7 arose from 
insertions into new genes with similarity to known genes of the 
enterobacteria and 12 resulted from insertions into sequences without 
similarity to entries in the DNA and protein databases (ref. 5, Example 1 
and this Example). 

20 

Three lines of evidence suggested that 16 of 19 transposon insertions into 
new sequences were clustered in three regions of the genome, initially 
designated A, B and C. First, comparing nucleotide sequences from regions 
flanking transposon insertion points with each other and with those in the 

25 databases showed that some sequences overlapped with one another or had 
strong similarity to different regions of the same gene. Second, Southern 
analysis of genomic DNA digested with several restriction enzymes and 
probed with restriction fragments flanking transposon insertion points 
indicated that some transposon insertions were located on the same 

30 restriction fragments. Third, when the same DNA probes were hybridised 
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to plaques from a S. typhimurium X DNA library, the probes from mutants 
which the previous two steps had suggested might be linked were found to 
hybridise to the same X DNA clones. Thus two mutants (P9B7 and P12F5) 
were assigned to cluster A, five mutants (P2D6, P9B6, PI 1C3, PI IDIO and 
5 PI IHIO) to cluster B and nine mutants (P3F4, P4F8, P7A3, P7B8, P7G2, 
P8G12, P9G4, PlOEll and P11B9) to cluster C (Figure 8). 

Hybridisation of DNA probes from these three clusters to lysates from a set 
of 5. typhimurium strains harbouring locked-in Mmf-P22 prophages (15, 16) 

10 showed that the three loci were all located in the minute 30 to 31 region 
(edition VIIL ref. 22) (Figure 7), indicating that the three loci were closely 
linked or constituted one large virulence locus. To determine if any of the 
X clones covering clusters A, B and C contained overlapping DNA inserts, 
DNA fragments from the terminal regions of each clone were used as 

15 probes in Southern hybridisation analysis of the other X clones. Hybridising 
DNA fragments showed that several X clones overlap and that clusters A, 
B and C comprise one contiguous region (Figure 8). DNA fragments from 
the ends of this region were then used to probe the X library to identify 
further clones containing inserts representing the adjacent regions. No X 

20 clones were identified that covered the extreme right hand terminus of the 
locus so this region was obtained by cloning a 6.5 kb EcdRllXbal fragment 
from a lysate of the Mu^/-P22 prophage strain TT15244 (16). 

Restriction mapping and Southern hybridisation analysis were then used to 
25 construct a physical map of this locus (Figure 8). To distinguish this locus 
from the well characterised inv/spa gene cluster at minute 63 (edition VIII, 
ref, 22) (8, 9, 23, 24, 25, 26), we refer to the latter as virulence gene 
cluster 1 (VGCl) and have termed the new virulence locus VGC2. Figure 
2 shows the position of two portions of DNA whose nucleotide sequence 
30 has been determined ("Sequence P and "Sequence 2"). The nucleotide 
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sequence is shown in Figures 1 1 and 12. 
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Mapping the boundaries of VGC2 on the S. typhimurium chromosome. 

Nucleotide sequencing of X clone 7 at the left hand side of VGC2 revealed 
5 the presence of an open reading frame (ORF) whose deduced amino acid 
sequence is over 90% identical to the derived product of a segment of the 
ydhE? gene of E. coli and sequencing of the 6.5 kb EcoRllXbdi cloned 
fragment on the right hand side of VGC2 revealed the presence of an ORF 
whose predicted amino acid sequence is over 90% identical to pyruvate 
10 kinase 1 of £. coli encoded by the pykF gene (27). On the £. coli 
chromosome ydhE and pykF are located close to one another, at minute 37 
to 38 (28). Eleven non-overlapping DNA fragments distributed along the 
length of VGC2 were used as probes in non-stringent Southern hybridisation 
analysis of E. coli and S. typhimurium genomic DNA. Hybridising DNA 
15 fragments showed that a region of approximately 40 kb comprising VGC2 
was absent from the £. coli genome and localised the boundaries of VGC2 
to within 1 kb (Figure 9). Comparison of the location of the Xba\ site close 
to the right hand end of VGC2 (Figure 8) with a map of known Xbal sites 
(29) at the minute 30 region of the chromosome (22) enables a map position 
20 of 30.7 minutes to be deduced for VGC2. 

Structure of VGC2. Nucleotide sequencing of portions of VGC2 has 
revealed the presence of 19 ORFs (Figure 8). The G-l-C content of 
approximately 26 kb of nucleotide sequence within VGC2 is 44.6%, 
25 compared to 47% for VGCl (9) and 51-53% estimated for the entire 
Salmonella genome (30). 

The complete deduced amino acid sequences of ORFs 1-11 are similar to 
those of proteins of type 111 secretion systems (6. 7). which are known to 
30 be required for the export of virulence determinants in a variety of bacterial 
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pathogens of plants and animals (7). The predicted proteins of ORFs 1 - 8 
(Figure 8) are similar in organisation and sequence to the products of the 
yscN-U genes of Yersinia pseudotuberculosis (31), to invC/spaS of the 
inv/spa cluster in VGCl of Salmonella typhimurium (8, 9) and to 

5 spa47/spa40 of the spa/mxi cluster of Shigella flexneri (32, 33, 34, 35,). 
For example the predicted amino acid sequence of ORF 3 (Figure 8) is 50% 
identical to YscS of K pseudotuberculosis (31), 34% identical to Spa9 from 
S, flexneri (35) and 37% identical toSpaQ of VGCl of 5. typhimurium (9). 
The predicted protein product of 0RF9 is closely related to the LcrD family 

10 of proteins with 43% identity to LcrD of Y. enterocolitica (36), 39% 
identity to MxiA of S. flexneri (32) and 40% identity to InvA of VGCl 
(23). Partial nucleotide sequences for the remaining ORFs shown in Figure 
8 indicate that the predicted protein from ORF 10 is most similar to K 
enterocolitica YscJ (37) a lipoprotein located in the bacterial outer 

15 membrane, with ORFl 1 similar to 5. typhimurium InvG, a member of the 
PulD family of translocases (38). 0RF12 and ORFl 3 show significant 
similarity to the sensor and regulatory subunits respectively, from a variety 
of proteins comprising two component regulatory systems (39). There is 
ample coding capacity for further genes between ORFs 9 and 10, ORFs 10 

20 and 11, and between ORF 19 and the right hand end of VGC2. 

VGC2 is conserved among and is speciHc to the Salmonellae. A 2.2 kb 
Pstl/HindlU fragment located at the centre of VGC2 (probe B, Figure 8) 
lacking sequence similarity to entries in the DNA and protein databases was 

25 used as a probe in Southern hybridisation analysis of genomic DNA from 
Salmonella serovars and other pathogenic bacteria (Figure lOA). DNA 
fragments hybridising under non-stringent conditions showed that VGC2 is 
present in S. aberdeen. S. gallinarum, 5. cubana. 5. typhi and is absent 
from EPEC. EHEC. K pestis. S. flexneri, V. cholera and S. aureus. Thus 

30 VGC2 is conserved among and is likely to be specific to the Salmonellae, 
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To determine if the organisation of the locus is conserved among the 
Salmonella serovars tested, stringent Southern hybridisations with genomic 
DNA digested with two further restriction enzymes were carried out. 
Hybridising DNA fragments showed that there is some heterogeneity in the 
arrangement of restriction sites between S. ryphimurium LT2 and 5. 
gallinarum. S. cubana and 5. typhi (Figure lOB). Furthermore, 5. 
gallinarum and S. ryphi conUin additional hybridising fragments to those 
present in the other Salmonellae examined, suggesting that regions of VGC2 
have been duplicated in these species. 



VGC2 is required for virulence in mice. Previous experiments showed 
that the LD50 values for i.p. inoculation of transposon mutants P3F4, P7G2, 
P9B7 and PI 1C3 were at least 100-fold greater than the wild type strain (5). 
In order to clarify the importance of VGC2 in the process of infection, the 

15 P.O. and i.p. LD50 values for mutants P3F4 and P9B7 were determined 
(Table 1). Both mutants showed a reduction in virulence of at least five 
orders of magnitude by either route of inoculation in comparison with the 
parental strain. This profound attenuation of virulence by both routes of 
inoculation demonstrates that VGC2 is required for events in the infective 

20 process after epithelial cell penetration in BALB/c mice. 



Table 1. LD,, values of S. typhimurium strains. 





LD50 (cfu) 


Strain 


i.p. 


p.o. 


12023 wild type 


4.2. 


6.2 X 10* 


P3F4 


1.5 X 10* 


>5 X 10' 


P9B7 


>1.5 X 10* 


>5 X 10' 



25 



cfu. colony forming units 
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Discussion 

A hitherto unknown virulence locus in 5. typhimurium of approximately 40 
kb located at minute 30.7 on the chromosome by mapping the insertion 

5 points of a group of signature-tagged transpbson mutants with attenuated 
virulence has been identified (5). This locus is referred to as virulence gene 
cluster 2 (VGC2) to distinguish it from the inv/spa virulence genes at 63 
minutes (edition VIII, ref. 22) which we suggest be renamed VGCl . VGCl 
and VGC2 both encode components of type III secretion systems, 

10 However, these secretion systems are functionally distinct. 

Of 19 mutants that arose from insertions into new genes (ref. 5 and this 
example) 16 mapped to the same region of the chromosome. It is possible 
that mini-Tn5 insertion occurs preferentially in VGC2. Alternatively, as the 

15 negative selection used to identify mutants with attenuated virulence (5) was 
very stringent (reflected by the high LD^o values for VGC2 mutants) it is 
possible that, among the previously unknown genes, only mutations in those 
of VGC2 result in a degree of attenuation sufficient to be recovered in the 
screen. The failure of previous searches for 5. typhimurium virulence 

20 determinants to identify VGC2 might stem from reliance on cell culture 
assays rather than a live animal model of infection. A previous study which 
identified regions of the 5. typhimurium LT2 chromosome unique to 
Salmonellae (40) located one such region (RF333) to minutes 30.5 - 32. 
Therefore, RF333 may correspond to VGC2, although it was not known 

25 that RF333 was involved in virulence determination. 

Comparisons with the type III secretion systems encoded by the virulence 
plasmids of Yersinia and Shigella as well as with VGCl of Salmonella 
indicates that VGC2 encodes the basic structural components of the 
30 secretory apparatus. Furthermore, the order of ORFs 1-8 in VGC2 is the 
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same as the gene order in homologues in Yersinia, Shigella and VGCl of 
S. typhimurium. The fact that the organisation and stnicture of the VGC2 
secretion system is no more closely related to VGCl than to the 
corresponding genes of Yersinia, together with the low G+C content of 

5 VGC2 suggests that VGC2, like VGCl (40, 41, 42) was acquired 
independently by S. typhimurium via horizontal transmission. The proteins 
encoded by ORFs 12 and 13 show strong similarity to bacterial two 
component regulators (39) and could regulate either ORFs 1-11 and/or the 
secreted proteins of this system. 

10 Many genes in VGCl have been shown to be important for entry of S. 
typhimurium into epithelial cells. This process requires bacterial contact (2) 
and results in cytoskeletal rearrangements leading to localised membrane 
ruffling (43, 44). The role of VGCl and its restriction to this stage of the 
infection is reflected in the approximately 50-fold attenuation of virulence 

15 in BALB/c mice inoculated p.o. with VGCl mutants and by the fact that 
VGCl mutants show no loss of virulence when administered i.p. (8). The 
second observation also explains why no VGCl mutants were obtained in 
our screen (5). In contrast, mutants in VGC2 are profoundly attenuated 
following both p.o. and i.p. inoculation. This shows that, unlike VGCl, 

20 VGC2 is required for virulence in mice after epithelial cell penetration, but 
these findings do not exclude a role for VGCl in this early stage of 
infection. 

Thus in summary mapping the insertion points of 16 signature-tagged 
25 transposon mutants on the Salmonella typhimurium chromosome led to the 
identification of a 40 kb virulence gene cluster at minute 30.7. This locus 
is conserved among all other Salmonella species examined, but not present 
in a variety of other pathogenic bacteria or in Escherichia coli KM. 
Nucleotide sequencing of a portion of this locus revealed 1 1 open reading 
30 frames whose predicted proteins encode components of a type 111 secretion 
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system. To distinguish between this and the type III secretion system 
encoded by the inv/spa invasion locus wc refer to the inv/spa locus as 
virulence gene cluster 1 (VGGl) and the new locus as VGC2. VGC2 has 
a lower G+C content than that of the Salmonella genome and is flanked by 

5 genes whose products share greater than 90% identity with those of the E. 
coliydhE and pyikF genes. Thus VGC2 was probably acquired horizontally 
by insertion into a region corresponding to that between the ydhE and pykF 
genes of £. coli. Virulence studies of VGC2 mutants have shown them to 
be attenuated by at least five orders of magnitude compared with the wild 

10 type strain following oral or intraperitoneal inoculation. 
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Fvam plft 5: Idpntification of virulen ce genes in Strevtococcus 

pneumoniae 

(a) Mutagenesis 

15 In the absence of a convenient transposon system, the most efficient way of 
creating tagged mutants of Streptococcus pneumoniae is to use 
insertion-duplication mutagenesis (Morrison et al (1984) J. Baaeriol. 159, 
870). Random S. pneumoniae DNA fragments of 200-400 bp will be 
generated by genomic DNA digestion with a restriction enzyme or by 

20 physical shearing by sonication followed by gel fractionation and DNA 
end-repair using T4 DNA polymerase. The fragments are ligated into 
plasmid pJDC9 (Pearce et al (1993) Mol. Microbiol. 9, 1037 which carries 
the erm gene for erythromycin selection in E. coli and S. pneumoniae), 
previously modified by incorporation of DNA sequence tags into one of the 

25 polylinker cloning sites. The size of cloned 5. pneumoniae DNA is 
sufficient to ensure homologous recombination, and reduces the possibility 
of generating an unrepresentative library in E. coli (expression of S. 
pneumoniae proteins can be toxic to £. coli). Alternative vectors carrying 
different selectable markers are available and can be used in place of 

30 pJDC9. Tagged plasmids carrying DNA fragments are introduced to an 
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appropriate S. pneumoniae strain selected on the basis of serotype and 
virulence in a murine model of pneumococcal pneumonia. Regulation of 
competence for genetic transformation in 5. pneumoniae is governed by 
competence factor, a peptide of 17 amino acids which has been 

5 characterized recently by Don Morrison's group at the University of Illinois 
at Chicago and which is described Havarstein, Coomaraswamy and 
Morrison (1995) Proc. Natl. Acad. Sci. USA 92, 11140-11144. 
Incorporation of minute quantities of this peptide in transformation 
experiments leads to very efficient transformation frequencies in some 

10 encapsulated clinical isolates of 5. pneumoniae. This overcomes a major 
hurdle in pneumococcal molecular genetics and the availability of the 
peptide greatly facilitates the construction of 5. 

pneumoniae mutant banks and allows flexibility in choosing the strain(s) to 
be mutated. A proportion of transformants are analysed to verify 

15 homologous integration of the plasmid sequences, and checked for stability. 
The very low level of reversion associated with mutants generated by 
insertion-duplication is minimized by the fact that the duplicated regions will 
be short (200-400 bp); however if the level of reversion is unacceptably 
high, antibiotic selection is maintained during growth of the transformants 

20 in culture and during growth in the animal. 

(b) Animal model 

The 5. pneumoniae mutant bank is organized into pools for inoculation into 
Swiss and/or C57B1/6 mice. Preliminary experiments are conducted to 

25 determine the optimum complexity of the pools and the optimum inoculum 
level. One attractive model utilises inocula of 10^ cfu, delivered by mouth 
to the trachea (Veber ei al (1993) J. Antimicrobial Chemotherapy 32, 473). 
Swiss mice develop acute pneumonia within 3-4 days, and C57B1/6 mice 
develop subacute pneumonia within 8-10 days. These pulmonary models 

30 of infection yield 10* cfu/lung (Veber ei al (1993) J. Antimicrobial 
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Chemotherapy 32, 473) at the time of death. If required, mice are also 
injected intraperitoncally for the idenUfication of genes required for 
bloodstream infection (Sullivan et al (1993) Antimicrobial Agents and 
Chemotherapy 37, 234). 

5 

(c) Virulence gene identification 

Once the parameters of the infection model are optimized, a mutant bank 
consisting of several thousand strains is subjected to virulence tests. 
Mutants with attenuated virulence are identified by hybridisation analysis, 
10 using labelled tags from the 'input' and 'recovered' pools as probes. If 5. 
pneumoniae DNA cannot be colony blotted easily, chromosomal DNA is 
liberated chemically or enzymatically in the wells of microtitre dishes prior 
to transfer onto nylon membranes using a dot-blot apparatus. DNA flanking 
the integrated plasmid is cloned by plasmid rescue in £. coli (Morrison et 
15 al (1984) J. Baaeriol. 159, 870), and sequenced. Genomic DNA libraries 
are constructed in appropriate vectors maintained in either E. coli or a 
Gram-positive host strain, and are probed with restriction fragments 
flanking the integrated plasmid to isolate cloned virulence genes which is 
then fully sequenced and subjected to detailed functional analysis. 



20 



Example 6: IHpntirication nf vinilgnce genes in Enterococcua faecalia 



(a) Mutagenesis 

Mutagenesis of E. faecalis is accomplished using plasmid pATl 12 or a 
25 derivative, developed for this purpose. pATl 12 carries genes for selection 
in both Gram-negative and Gram-positive bacteria, and the att site of 
Tn/545. It therefore requires the presence in the host strain of the integrase 
for transposition, and stable, single copy insertions are obtained if the host 
does not contain an excisionase gene (Trieu-Cuot a/ (1991) Gene 106, 
30 21). Recovery of DNA flanking the iniegrated plasmid is accomplished by 
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restriction digestion of genomic DNA, intramolecular ligation and 
transformation of E. colL The presence of single sites for restriction 
enzymes in pAT112 and its derivatives will (Trieu-Cuot et al (1991) Gene 
106, 21) allows the incorporation of DNA sequence tags prior to transfer 
5 to a virulent strain of E. faecalis carrying plasmid pAT145 (to provide the 
integrase function) by either conjugation, elcctroporation or transformation 
(Trieu-Cuot et al (1991) Gene 106, 21 ; Wirth et al (1986) 7. BaaerioL US, 
831). 

10 (b) Animal model 

A large number of insertion mutants are analysed for random integration of 
the plasmid by isolating DNA from transcipients, restriction enzyme 
digestion and Southern hybridisation. Individual mutants are stored in the 
wells of microtitre dishes, and complexity and size of pooled inocula are 

15 optimised prior to screening of the mutant bank. Two different models of 
infection caused by E. faecalis are employed. The first is a well established 
rat model of endocarditis, involving tail vein injection of up to 10* cfu of 
E. faecalis into animals that have a catheter inserted across the aortic valve 
(Whitman et al (1993) Antimicrobial Agents and Chemotherapy 37, 1069). 

20 Animals are sacrificed at various times after inoculation, and bacterial 
vegetations on the aortic valve are excised, homogenized and plated to 
culture medium to recover bacterial colonies. Virulent bacteria are also 
recovered from the blood at various times after inoculation. The second 
model is of peritonitis in mice, following intraperitoneal injection of up to 

25 10' cfu of £. faecalis (Chenoweth et al (1990) Antimicrobial Agents and 
Chemotherapy 34, 1800). As with the 5. pneumoniae model, preliminary 
experiments are done to establish the optimum complexity of the pools and 
the optimum inoculum level, prior to screening the mutant 
bank. 

30 
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(c) Virulence gene identification 

IsolaUon of DNA flanking the site of integration of pATl 12 using its E. colt 
origin of replicaUon is simplified by the lack of sites for most of the 
commonly used 6 bp recognition restriction enzymes in the vector. 
5 Therefore DNA from the strains of interest are digested with one of these 
enzymes, self-ligated, transformed into£. coli and sequenced using primers 
based on the sequences adjacent to the att sites on the plasmid. A genomic 
DNA library of E.faecalis are probed with sequences of interest to identify 
intact copies of virulence genes which are then sequenced. 



10 



Example 7: Tripntincation of virulen c e genes in Pseudomonas geniginosa 



(a) Mutagenesis 

15 Since transposon Tn5 has been used by others to mutagenise Pseudomonas 
aeruginosa, and the mini-Tn5 derivative that was used for the identification 
of Salmonella typhimurium virulence genes (Example 1) is reported to have 
broad utilisation among Gram-negative bacteria, including several 
pseudomonads (DeLorenzo and Timaris (1994) Methods Enzymol. 264, 

20 386), a P. aeruginosa mutant bank is constructed using our existing pool of 
signature tagged mini-TnJ transposons by conjugal transfer of the suicide 
vector to one or more virulent (and possibly mucoid) recipient strains. This 
approach represents a significant time saving. Other derivatives of Tn5 
designed specifically for P. aeruginosa mutagenesis (Rella et al (1985) Gene 

25 33. 293). may alternatively be employed with the mini TnJ transposon. 

(b) Animal model and virulence gene identification 
The bank of P. aeruginosa insertion mutants is screened for attenuated 
virulence in a chronic pulmonary infection model in rats. Suspensions of 
30 P. aeruginosa cells are introduced into a bronchus followine tracheotomy, 
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and disease develops over a 30 day period (Woods et al (1982) Infea. 
Immun. 36, 1223). Bacteria are recovered by plating lung homogenates to 
laboratory medium and sequence tags from these are used to probe DNA 
colony blots of bacteria used as the inoculum. It is also possible to subject 
5 the mutant bank to virulence tests in a model of endogenous bacteremia 
(Hirakata et al (1992) Antimicrobial Agents and Chemotherapy 36, 1 198), 
and cystic fibrosis (Davidson et al (1995) Nature Genetics 9, 351) in mice. 
Cloning and sequencing of DNA flanking the transposons is done as 
described in Example 1. Genomic DNA libraries for the isolation and 
10 sequencing of intact copies of the genes are constructed in the laboratory by 
standard methods. 

Example 8: Identification of virulence genes in Aspergillus fumigatus 

15 (a) Mutagenesis 

The functional eqiuvalent of transposon mutagenesis in fungi is restriction 
enzyme mediated integration (REMI) of transforming DNA (Schiestl and 
Petes (1991) Proc. Nail. Acad. ScL 88, 7585). In this process, fungal cells 
are transformed with DNA fragments carrying a selectable marker in the 

20 presence of a restriction enzyme, and single copy integrations occur at 
different genomic sites, defined by the target sequence of the restriction 
enzyme. REMI has already been used successfully to isolate virulence 
genes of Cochliobolus (Lu et al (1994) Proc. Nail. Acad. ScL USA 91, 
12649) and Ustilago (Bolker et al (1995) MoL Gen. Genet. 248, 547), and 

25 have shown that incorporation of active restriction enzyme with a plasmid 
encoding hygromycin resistance leads to single and apparently random 
integration of the linear plasmid into the A, fumigatus genome. Sequence 
tags are introduced into a convenient site in one of two vectors for 
hygromycin resistance, and used to transform a clinical isolate of A, 

30 Jumigaius, 
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(b) Animal model and virulence gene identification 
The low-dose model of aspergillosis in neutropenic mice in particular 
closely matches the course of pulmonary disease in humans (Smith et at 
(1994) Infect. Immm. 62, 5247). Mice are inoculated intranasally with up 

5 to 1,000,000 conidiospores/mouse, and virulent fungal mutants are 
recovered 7-10 days later by using lung homogenates to inoculate liquid 
medium. Hyphae arc collected after a few hours, from which DNA is 
extracted for amplification and labelling of tags to probe colony blots of 
DNA from the pool of transformants comprising the inoculum. DNA from 

10 the regions flanking the REM! insertion points are cloned by digesting tiie 
transformant DNA with a restriction enzyme that cuts outside the REMI 
vector, self ligation and transformation of E. coli. Primers based on the 
known sequence of the plasmid are used to determine the adjacent A. 
Jumigatus DNA sequences. To prove that the insertion of the vector was 

1 5 the cause of the avirulent phenotype, the recovered plasmid is recut with the 
same restriction enzyme used for cloning, and transformed back into tiie 
wild-type A. Jumigatus parent strain. Transformants that have arisen by 
homologous recombination are then subjected to virulence tests. 



20 
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CLAIMS 

L A method for identifying a microorganism having a reduced 
adaptation to a particular environment comprising the steps of: 
5 (1) providing a plurality of microorganisms each of which is 

independently mutated by the inscrtionai inactivation of a gene with a 
nucleic acid comprising a unique marker sequence so that each mutant 
contains a different marker sequence, or clones of the said microorganism; 

(2) providing individually a stored sample of each mutant 
10 produced by step (1) and providing individually stored nucleic acid 

comprising the unique marker sequence from each individual mutant; 

(3) introducing a plurality of mutants produced by step (1) into the 
said particular environment and allowing those microorganisms which are 
able to do so to grow in the said environment; 

15 (4) retrieving microorganisms from the said environment or a 

selected part thereof and isolating the nucleic acid from the retrieved 
microorganisms; 

(5) comparing any marker sequences in the nucleic acid isolated 
in step (4) to the unique marker sequence of each individual mutant stored 

20 as in step (2); and 

(6) selecting an individual mutant which does not contain any of 
the marker sequences as isolated in step (4). 

2. A method according to Claim 1 wherein the plurality of 
25 microorganisms as defined in step (1) is produced from a plurality of 
microorganisms, each of which comprises a nucleic acid comprising a 
unique marker sequence, by changing iheir condition from a first given 
condition to a second given condition wherein (a) in the first given condition 
the said nucleic acid comprising a unique marker is maintained episomally 
30 and (b) in the second given condition the said nucleic acid comprising a 
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unique marker sequence insertionally inactivates a gene. 
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3. A method according to Claims 1 or 2 further comprising the steps: 
(1 A) removing auxotrophs from the plurality of mutants produced 

5 in step (1); or 

(6A) determining whether the mutant selected in step (6) is an 
auxotroph; or 

both (lA) and (6A). 

4. A method of identifying a gene which allows a microorganism to 
adapt to a particular environment, the method comprising the method of any 
one of Claims 1 to 3 followed by the step: 

(7) isolating the insertionally-inactivated gene from the individual 
mutant selected in step (6). 

5. A method according to Claim 4 further comprising the step: 

(8) isolating from a wild-type microorganism the corresponding 
wild-type gene using the insertionally-inactivated gene isolated in step (7) 
as a probe. 

6. A method according to any one of Claims 1 to 5 wherein the 
particular environment is a differentiated multicellular organism. 

7. A method according to Claim 6 wherein the multicellular organism 
25 is a plant. 

8. A method according to Claim 6 wherein the multicellular organism 
is a non-human animal. 

30 9. A method according to Claim 8 wherein the animal is a mouse, rat. 



10 



15 



20 
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rabbit, dog or monkey. 

10. A method according to Claim 9 wherein the animal is a mouse. 

5 11. A method according to any one of Claims 6 to 10 wherein in step (4) 
the microorganisms arc retrieved from the said environment at a site remote 
from the site of introduction in step (3). 

12. A method according to any one of Claims 8 to 10 wherein in step (3) 
10 the microorganism is introduced orally or intraperitoneally. 

13. A method according to Claim 12 when dependent on Claims 8 or 9 
wherein in step (4) the microorganisms are retrieved from the spleen. 

15 14. A method according to any one of the preceding claims wherein the 
microorganism is a bacterium. 

15. A method according to any one of Claims 1 to 13 wherein the 
microorganism is a fungus. 

20 

16. A method according to Claim 7 wherein the microorganism is a 
bacterium pathogenic to plants. 

17. A method according to Claim 7 wherein the microorganism is a 
25 fungus pathogenic to plants. 

18. A method according to any one of Claims 8 to 10 wherein the 
microorganism is a bacterium pathogenic to animals. 

30 19. A method according to any one of Claims 8 to 10 wherein the 
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microorganism is a fungus pathogenic to animals. 

20. A method according to Claim 18 wherein the bacterium is any one 
of Bordetella pertussis, Campylobacter jejuni, Clostridium botulinum, 

5 Escherichia coli, Haemophilus ducreyi, Haemophilus influenzae, 
Helicobaaer pylori, Klebsiella pneumoniae, Legionella pneumophila. 
Listeria spp., Neisseria gonorrhoeae. Neisseria meningitidis, Pseudomonas 
spp., Salmonella spp.. Shigella spp.. Staphylococcus aureus. Streptococcus 
pyogenes. Streptococcus pneumoniae. Vibrio spp., and Yersinia pestis. 

10 

21. A method according to Claim 19 wherein the fungus is any one of 
Aspergillus spp., Cryptococcus neoformans and Histoplasma capsulatum. 

22. A method according to any one of the preceding claims wherein in 
15 step (1) the gene is insertionally inactivated using a transposon or 

transposon like element or other DNA sequence carrying a unique marker 
sequence. 

23. A method according to any one of the preceding claims wherein in 
20 step (1) each different marker sequence is flanked on either side by 

sequences common to each said nucleic acid. 

24. A method according to Claim 23 wherein in step (2) the nucleic acid 
comprising the unique marker is isolated using DNA amplification 

25 techniques and oligonucleotide primers which hybridise to the said common 
sequences. 

25. A method according to Claim 23 or 24 wherein in step (4) the 
nucleic acid comprising a plurality of said marker sequences is isolated 

30 using DNA amplification techniques and oligonucleotide primers which 
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hybridise to the said common sequences. 

26. A microorganism obtained using the method of any one of the 
preceding claims. 

5 

27. A microorganism comprising a mutation in a gene identified using the 
method of Claim 5. 

28. A microorganism obtained according to Claim 26, when dependent 
10 on Claim 8, or Claim 27 for use in a vaccine. 

29. A vaccine comprising a microorganism according to Claim 26, when 
dependent on Claim 8, or Claim 27 and a pharmaceutically-acceptable 
carrier. 

15 

30. A gene obtained using the method of Claims 4 or 5. 

31. A gene according to Claim 30 which is isolated from the Salmonella 
typhimurium genome and hybridises to the sequence shown in Figure 5 

20 under stringent conditions. 

32. A gene according to Claim 30 which is isolated from the Salmonella 
typhimurium genome and hybridises to a sequence shown in Figure 6 under 
stringent conditions. 

25 

33. A polypeptide encoded by a gene according to any one of Claims 30 
to 32. 

34. A method of identifying a compound which reduces the ability of a 
30 microorganism to adapt to a particular environment comprising the step of 
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selecting a compound which interferes with the function of a gene according 
to any one of Claims 30 to 32 or a polypeptide according lo Claim 33, 

35. A compound identifiable by the method of Claim 34. 

5 

36. A compound according to Claim 35 wherein the particular 
environment is a host organism. 

37. A compound according to Claim 36 wherein the host organism is a 
10 plant. 

38. A compound according to Claim 36 wherein the host organism is an 
animal. 

15 39. Use of a compound according to any one of Claim 36 to Claim 38 
for treating infection of said host organism with said microorganism. 

40. A molecule which selectively interacts with, and substantially inhibits 
the function of, a gene according to any one of Claims 30 to 32 or a nucleic 

20 acid product thereof. 

41 . A molecule according to Claim 40 which is an antisense nucleic acid 
or nucleic acid derivative. 

25 42, A molecule according to Claim 40 or 41 which is an antisense 
oligonucleotide. 

43. A molecule according to any one of Claims 40 to 42 for use in 
medicine. 



30 
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44. A method of treating a host which has, or is susceptible to, an 
infecUon with a microorganism, the method comprising administering an 
effective amount of a molecule or compound according to Claim 36 or 40 
wherein said gene is present in said microorganism, or a close relative of 

5 said microorganism. 

45. A pharmaceutical composition comprising a molecule or compound 
according to Claim 38 or 40 and a pharmaceutically acceptable carrier. 

10 46. The VGC2 DNA of Salmonella typhimurium or a part thereof, or a 
variant of said DNA or a variant of a part thereof. 

47. A mutant bacterium wherein if the bacterium normally contains a 
gene that is the same as or equivalent to a gene in VGC2, said gene is 

15 mutated or absent in said mutant bacterium. 

48. A method of making a bacterium according to Claim 47. 

49. Use of a mutant bacterium according to Claim 47 in a vaccine. 

20 

50. A pharmaceutical composition comprising a bacterium according to 
Claim 47 and a pharmaceutically acceptable carrier. 

51. A polypeptide encoded by VGC2 DNA of Salmonella typhimurium 
25 or a part thereof, or a variant of said polypeptide or a variant of a part 

thereof. 



30 



52. A method of identifying a compound which reduces the ability of a 
bacterium to infect or cause disease in a host comprising the step of 
selecting a compound which interferes with the function of a gene in VGC2 
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according to Claim 46 or a polypeptide according to Claim 51. 

53. A compound identifiable by the method of Claim 52. 

5 54. A molecule which selectively interacts with, and substantially inhibits 
the function of, a gene in VGC2 of Salmonella typhimurium or a nucleic 
product thereof. 

55, A molecule or compound according to Claim 53 or 54 for use in 
10 medicine. 

56. Any novel feature or combination of features disclosed herein. 
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Nome. m0CC2 1 

J0S534 Eschenchio col i ATP-depcndent dp proteose proteolytic 

component (clpP) gene, complete cds. 
Length = 1236 



Minus Strond HSPs: 
Score 



..... . 453 (125.2 bits), Expect = 4.3e-28. P = ^ ^e^^ 

Identities = 113/141 (80%). Posi tives = 113/141 C80X). S trond .- Minus 

Query Is our Salmonella sequence v 

Ouerv 359 CCACCAGCCGCTCCCCTACCAGCCCCAGCCGACCCATATTGAAAT TCACCCCCCCCAAAT ^ 

^"^^ ,,, , , , II tiiiiinMiiiiin inn iiiiiiii iiiii mn 

Sbicf 785 CCAACCGTTGGGCGCCTACCAGCCCCAGGCCACCGATATCGAAATTCATGCCCGTGAAAT 844 

c/pP gene T 

Query- 299 TTTGAAAGT AAAAGCGCGCATGAATGAACTT ATGRMKYKMMATACGGGTCANTCTCTTGA 240 

I IMIIII IIIIIIIMMIIIIIMIMIM IMIIIIIII II I II 

Sb)ct: 845 TCTGAAAGTTAAACGGCGCATGAATGAACTTATGGCGCTTCATACGGGTCAATCATTAGA 904 

Query: 239 GCAGATTGAASGTGATACTGA 219 

tllMIMI ItlllM M 
Sbjct: 905 ACAGATTGAACGTGATACCGA 925 

Score = 231 (63.8 bits). Expect = 4.0e-24, Poisson P(2) = 4.0e-24 
I cities = 55/66 , Positives = 55/66 (83X). Strond ^ Minus 

Query 194 T GAAGCGCT AGAGTACCCTTTGCTT GACTC AATTTTGACCCATCGTAATTGATGCCCTGG 135 

ItltllM) It HUM HH H tl HI HIIIHIHHIIHIIIII I 
Sbjct: 950 TGAAGCGGTGGAATACGGTCTGGTCCATTCGATTCTGACCCATCCTAATTCATGCCAGAG 1009 

Query: 134 ACGCAA 129 
H H I 

Sbjct: 1010 GCGCAA 1015 

>£CCIPXGWA 223278 t.coli ClpX gene, complete COS 
Lenotn = 1945 

Minus Strond HSPs: 

Score = 364 (100.6 bits). Expect = 1.6e-20, P = l.6e-20 

Identities r 88/107 (82X). Positives = 88/107 (82%), Strond = Minus 

Query 325 CATATT CAAATTCACCCCCGCCAAATTTTCAAAGTAAAACGGCGCATGAATGAACTTAT G 266 

Mill IIHIMI HIH HHII HHHI H H I H H II I I II I H H II II 
Sb)Ct: 1 GATATCGAAATTCATGCCCGTGAAATTCT GAAAGTTAAAG GGCGCATGAATGAACTTATC 60 

Query: 265 RMKYKMMATACGGGTCANTCTCTTGAGCACATTGAASCTCATACTCA 219 

lllllltlH H I H IIIIHIH IIIHH II 
Sbjct: 61 GCGCTTCATACGCGTCAATCATTAGAACAGATTGAACGTCATACCGA 107 

Score = 231 (63.8 bits). Expect = 6.8e-24, Poisson P(2) » 6.8e-24 
Identities = 55/66 (831^). Positives = 55/66 (839^), Strand = Minus 

Query- 194 TGAAGCGGTAGAGTACGGTTTGGTTGACT CAATTTTGACCCATCGTAATT GATGCCCTGG 155 

IIIIIIIII II IIIIH nil II H III IIIIHIIIHI HH I 

Sbjct: 132 TGAAGCGGTGGAATACCGTCTGGTCGATTCGATTCTGACCCATCGTAATTGATGCCACAG 191 

Fetcb > Cb ba:Ecoclppa 

5uery: 134 ACGCAA 129 . qK then type J Biol Chen 265, 12536. 

mil (1990) 



Sbjct: 192 GCGCAA 197 



Figure 5 
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A) new virul nee factors with similarity to sequenced genes: 

1. plFlO 

similarity to clpP (E*coli) 
(Figure 5 of application) 

2. p2D6 

similarity to IcrD (Yersinia spp.) 
sequence p2D6_l_I 

GGTCTTAATGTACGGGCATGGTCTGCATCGATAACTCCGGCACGCAAATCGCCATCGATACTCATTTGT 

TTGGCTGGCATCCCJlTCAAGCGAGAAACGTGCGCTAACTTCCGCCACCCTCTCGATACCTTTTGT 

ACAATAAATTGCACGATAGTAATGATGGTAAATACGACCAACCCAACGGTGAGATTTCCTCCTACGACA 

AACTTACCGAAAGCATCCACAAATATTACCGGCATTATGTTGTAACAGTACCCAGCCGTGATGTGCTGA 

TTGGGGAGTTAACAACCGATTTAT 

3. 84C3 

probably same gene as p2D6, but different region 

similarity to S- typhimuriim invA and Yersinia spp, IcrD 
sequence s4C3_l_U 

GCGCGGACGCTAGTGTGGTGGGTGACAGCCAGACGTTACCGAACGGGATGGGGCAGATCTCrrGGCT^ 

CAAAAGACATGGCCCATAAGGCGCAAGGTTTTGGGACTGGACCTTTTCGCGGGCAGACAACGTATCT^ 

GTCTTATTAAAATGTGTCCTGCTTCGGCATATGTATCGAACCCTCGGAGCAAAGTCGTTTGGGCGCAGA 

ATTAGTACGTTTGGGTCGGTTGCTGTTATTCCTTGGGCTCGGAAAAAGAGTGCCAGCGTGAAGGAGTG^ 

GATTTGGCAGACTGGCCGCCTAAT 

sequence s4C3_l_R 

CACTATAGGGAAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCTACTAGTCATATGGATTGCACTT 
GTGTATAAGAGTCAGGATTAGAGGACATGCGCCGGGAACCATACTATCTTTTTCCGGTGCTTCGACGCC 
ATTTGCGGAAACCACAGACTTTTTGCGGCGAATGAGGATAATTGGCAATGCTAACAACGCTGAAAAGJ^ 
AGCGAGAGTGATAAAAGGAAAGCCAGGAATTAAAGCGAGGAGCATTAAAACCACAGCGGCTAATATGAG 
CGACTGAGGTTGTCTGGCAATTTG 

4. p3F4 

similarity to invG (S.typhijnuritm) 
sequence p3F4_l_U 

TGCAGGCCGACTCTAGAGGATCCCCGGGTACCGGTAATTTCTTTAACCTCGCATCCCGGTGGATGAAAG 

GATATTCTGGCTGCGTAAGTAATGAATGAACCGCCCAGTAGATAAAATATTGAAAGTGATAACCTGATG 

TTTTAATAACGATGCAGGATATACATATAACATGCTGGCATCAAACCAGGTAAGCAAATCATATO 

TGCCAGGTTATTCAAACTATCGACCGGTGGTCCAGGCGGGAATTTTTCCACTAAATGTAGGTGGGATCA 

ATGGGCTAATTGGTATAGGCGGAT 

Figure 6 Sheet 1 of 5 
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similarity to yscC (Yersinia spp.) 
sequence p7G2_l_U 

CCTGTGATTCCCXyiTGAAATAGCTTTTACGAAAGCTGTCAGACNTGCTGAAGAATACCXrrGC^ 

AAGCTTGTAACTTTTGGGTATTGTTCCAACGCATGCTGAAACGGGTTATGGATATATTCGTCGCGGTGA 

GTTGATAGGAAATGACGCTTATGOVGTGGCTGAATTTGTGGAGJUUICCGGATATCGATACCK^ 

CTATTTCAAATavGGGGAAATATTACTGGCCTAGCGGCGATGTTTTTATTTCGCGC^ 

AAACGAATTAAACGTATCTATCACCCCCAAATTCATACAGCTTGTGAA 

sequence p7G2_3_0 



TTACTAAACAGGGCCCCGGACCaTGTAAACACCACGCtTGCCAAcACTAAAAAACGATGCtTGCcGTAA 
AAAAATTGAAcGTTATTT'ACTTAATAcGCCTATTTTATTTACATTATGCACGGACAGAGGGTGAGGATT 
AAATGGATAATATTGATAATAAGTATAcTCCACAGCTATGTAAAATTTTgGGGGcTATATCgGATtTGg 
TTGtTTtTAATTTAGCCtTATGGcTTtCACTAGGATGTGTCTATTTTTTTtGTGGtCAAGCACAGAGAT 
TTATTCCCCaACCACC 

sequence p7G2_l_I 

TTTCCTTGCCGTGACAGTCCGGGATGCGAGGTTAACGAAATTACCGGCACCAAAGCTGTGGAGGTGAGC 
GGTGTCCCCAGCTGCCTGACTCGTATTAGTCAATTAGCTTCAGTGCTGGATAATGCGTTAATCAAACGA 
AAAGACAGTGCGGTGAGTGTAAGTATATACACGCTTAAGTATGCCACTGCGATGGATACCCAGTACCAT 
TATCGCGATCAGTCCGTCGTGGTTCCAGGGGTCGCCTAGTGTATTGCGTGAGATGAGTAACACCAGCGT 
CCCGACGTCATCGACGAACAATGG 



6. p9B7 



similarity to fHQ, invX (E.coli) 
sequence p9B7__l_I 

CATGAGTAACCTACCCAACTGTAATCTTTACCAATATGCATCATAATCTTCTGCTGGTAAATGA TTG^ 

AATATCGGAAAGGTAAGTGACATAAGCACGCCATTACGTAAAAGTGCGGCCCCTAAACTGCCACTTT^ 

AATAAGGGAAGTAATAAAGAAAGGCTCAATGGTCGJATAAAAGCCACAGCaUVTGCAATAAGCCACT 

TTTACCTGTTGTGCCATT'CAACCATGCTCTCCAATTCGTAACATTATCTGCCGGG 

ATACCGCTAAGCCATGGGTAG 

sequence p9B7_3_0 

ATTCCAGCCCCCGGGCOlTCTAACCACTATGAACAATCATCTTCrGGGTGGACAAT^ 

GGCCAGGCTTGTGCAATATGTATGTCATCACGTAAAAGCGCGGCCCCTTAATCTCCCCATTCTTCCOT 

AGGGCAGTTATCACGGCTGGCTCAATGGCCGGCTTAACAGCCACAG 

7. 36F5 

similarity to yscU (Y, enterocolitica) 
sequence s6r5_l_0 

GAGGCGCGTCTTCGGTTGAGGGTCGCCCTCCAGATCTTTATGCTCCTGTTTTACGTCATCTTTACTCAT 
TTTAAGATCTTTTCTAATCTTATAATATTGAAAAGAATAGTCCAGTATGCCAACGACGAAATAAAGAAA 
CATCACCCCAACCCATAACCATTTTrrauVTGATGAAAGCACAAGCACGCCAC^ 

CGGAGGGGGCCGGAAAGTGCTGGGATCTTGATTAATGAAAAAGGCAAAGGGAAGAGATAGGATGATGCA 
TGCTGGTTGGAGGCAGATTATTCATCTTCG 

Figure 6 Sheet 2 of 5 
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B) new s quences without similarity to entries in DNA or protein 
databases: 

I. 84D10 

sequence 84D10_1_U 

AGTTGCCGTATTTATTAAATATTCACCTCAGGTCAATATCXSAGGTCTTCCCCKX^ 
TACTAGAGATATCACTCCCTGGGTTGCAATACAGTACGATTACnTATCTTGAT 
ACyUlTGGCAGCTGACGTACCCGCGAGACAAACATTCTGGATTATGGACGTTATCAAC 
AACOTGGTGAACTGGTTGATCUUIATACCCCTATCCCTTGCATGTTATCGCTGACA^ 

AGCGGGCATCCTCGATCGGCT 
sequence 8 4D10_1_R 

CAAGACy^CAGATCCAACTCGGGCCGATCGCCATAACGCCAGCAGTTTGAAAGATGAAAGCCCA 
CCAGCCATTCCGGTACAGCGTAACGAGCAGGTTGCCAGAAATAACGATAAAGTTGCAACACCTCGGGAT 
CAGGTCGGCTCAAAAACGGGGTCTCAGGCAAAAATAGCCGATCAGGATGCCCACTCCTAATAACAGTCC 
TGTCAACGATAACATCAACGGATAAGGGTATTTCATCAACCACTTCACCACCTTCCCTTTATTGG^ 

GGATAACGTCCATAATCCAGA 



2. S4B10 

sequence s4H10_l_^U 

AGGGCTTTATTGATTCCATrrTTACACTGATGAATGTTCCGTTGCGCTGCCCGGATTACAGC 

TCTAGAGTCGACCTGCAGAACCGAGCCAGGAGCAAATTAATTTTTTTGGGCAATTGCTGAAAGATGAA 

CATCCACCAGTAACGCCAGTGCTTTATTACCGCAGGTTATGTTGACCAGACAAATAGATTA 

TAACGGTAGGCGTCGATTATCTTGTCAGAATATCAGGCGCAGCATCGCAAGCGCTTAATAAGCTGGGTA 

ACATGGCATGAAGGGGCAACCC 

sequence s4B10_l_R 

CACTATAGGGAAAGCTTGCATGCCTGCAGGTCGACrCTAGAGGATCTACTAGTCATATGGATTC^^ 

CGGCCAGATCTGATCAAGAGACAGATCCAACTCGGGCCGATCGCCATAACGCCAGCAGTTTGAAAGATG 

AAAGCCCAGCTTATCCAGCCATTCCGGTACAGCGTAACGAGCAGGTTGCCAGAAATAACGATAAAGTTG 

CAACACCTCGGGATCAGGTCGGCTCAAAAACGGGGTCTCAGGCAAAAATAGCCGATCAGGATGCCCACT 

CCTAATAACAGTCCTGTCAACG 



3. p4G5 

sequence p4G5_l_0 

cccccccccttctcctggcttacacagccccagaccggccx:tggaaaaggc« 
ggccagcaacatattttcacgcgccgccagatcgtggccgtaacccacggctttccgcagcg^ 
aatcatcgctatcgcgccaatcgccaggctgtcggtaaacggcgtggcgttgagcgcgctgtaggcctc 
aatcgoitgcgtcaacgcatcgataccggtcatcgccgtcacgtttggcggaacgccttcggtcac^ 

AGCATCAAGAATCGCCACGTCCGGC 
sequence p4G5_l_U 

cgcgaacgtgcgccgcaactgcttgtggacggtgaattgcagtttgacgcc gctct cgtgccgga 

gccgcgcaaaaagcgcctgacagcccgctgcaaggccgcgccaacgtgatgattttcccgtcgctggag 

gcgggcaatattggctacaaaatcactcagcgtctgggaggctatcgcgctgttgggccgctaattcag 

gggcttggcgcgccgcttcacgacctctcccgaggctgtagcgtgcaggaaattatcgaactgcggttg 

gtgagaaaaccaa 
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sequence p7A3_l_U 

CGCCCTAGCATGCCTGGCGTTGTCCGGTTATTGCTCGTCAAGCCyVACAGATGCAAAAGGTCyiGAGCGAC 

TCTCGAATCATGGGGGGTOITGTATCGGGATGGTGTAATCTGTGATGACTTATTGGTACGAGAAGTGCA 

GGATGTTTTGGATAAAAATGGGTTACCCGCATGCTGAAGTATCCAGCGAAGGGCCGGGGAGCGTCTT 

TTCATGATGATATACAAATGGATCAGCAATGGCGCAAGGTTCAACCATTACTTGCAGATA^^ 

TATTGCACTGGCAGATTAGTCACTCTC 

sequence p7A3_l_I 



CCCTTCCCAGGCTCGACAGGTACACAGCCAGCCACTGGTGCAGGCAGTTACrrGCTTTC^ 

GGAGCAATATCCTGATATATTAAAGAAAGAGCGGGATCCCCITTCTTTACTGCTGCTAACCTTTCTT^ 

AAAATGCGTTGATGAGATTCATCCAGCACACCACTGATAACAAAAGAGCGCCGCATTGGCGTAACATTG 

ACAAGCCCCACTAAACCGCTCTCTATTATCGCAGAAATAATATCATCCCCCTGAGACTGATGAGAGTGA 

CTATTCTGCCAGCGCAAATAACCC 



5. plOEll 
sequence plOEll_l 

ATACCGAGTATTAAGCGGCTGTGTAACATCGTCATCCAACAACATACGCAGCGAGCCGCCACGCCGGAA 
AAACCGCATCGTGTCATGTGCCTGTTGTAGGGTCGGGTCTTTTTTCATGAGTACGTTTTCTGCGCTATC 
ATACTGGAAATTTCCCCCa^CTTACTGATAAGCCCTGTOVGTTGGGTAAGGACAGAGTTAAGCTCCTGA 
GACATTTTTTGGAATGGTTATCTTTCCCCGACTOVTAAAATCGGTATTCCCGCTGGGGGCAATATCC^ 
AGACGCTTTGGTCGCCCGTAGGGCACC 

sequence pIO£H_U 

GCCGTATGCCTGCAGTTGCCCGGTTATTGCTCGTO^GCGAACCGATGCCAAAGGTGAGAGCG^ 

GAATCATGGGGGGTCATGTATCGG<yiTGGTGTAATCTGTGATGACTTATTGGTACGAGAAGTGCAGGAT 

GTTTTGGTAAAAATGGGTTACCCCCATGCTGAAGTATCCAGCGAAGGGGCGGGGAGCGTGTTAATTCAC 

GATGATATTCAAATGGGTCAGCAATGGGGCAAGGTTCAACCCCCACTTGCAGATATTCCCCCCCC 

GGACTGGCAGATTAGTCACTCTCA 



6. s4B9 

sequence 3 4B9_1_0 

GGGCGACCTGCCCGCGGCGCAACTTTCCCCGAAGCGTTTTCCATTTCCTTGTTCTT 

AGCTTACCTAAGCCTTGTCTTGCCTATGTGACAATACTGCTTGGAGAACACCCGGACGTCCATGA 

GCTATACAGATCACAGCGGATGGGGGATGGTGAATCGGTTATTATACCACAAGTCGCAGCTCTGAGCrr 

ATTGCTATTGAGATAGAAAAACACCCCGCTTCAACrTGGATTTTGAATAATGTAATACGCAA^ 

ACACTATATTCGGGTGGCGTATAA 

sequence s4B9_l_R 

TTCGAGCTGGGGCACCGCTAATATCTTTAACCTCGCATCCCGGTGATGAAAGGATATTCT 
AGTAATGAATGAACCGCCCAGCAGATAAAATATTGACAGTGATAACCCGATGTTTTTTTAACGATGC^ 
GCTATACATATAAOVTAGCTCGCCACCAACACAGCTGAAGTAAATCATAITGTTGCTGCCAG 
CACACTATTGTCCGGCGGGCCAGCGGGGATTTTCCCCCTAAATCTCGCTGGTTCTCAAA 

7. p4F8 

sequence p4F8_l_I 

AGTCTACGATTTCGCTATATCTTCTCTTAATCATGGCCGCCATTTGTGGATGCGATTTTA^AATATCCG 
GGCGATCTTTCATTAAAAAATAAAGATTCCCCATGACTTCACAGATAAAGGTATCGGTATTTTGAGTGA 
TACGTAACAATTCGTTCTCTTCGTGTGGGTCCATGATGCGAAGAATAATGGTGGCATCATTTTCATGAG 
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GATTATGAACCCGAAATCTTTCTCTTTGCGATGCGCAGGCTAACTCTTTCAftCTCAAAAAA^ 
TAAGCCGCTCTCGTGTGGGGGCGC 

a. p7B8 

sequence p7B8_l_0 

GCGCCCCTTTAATTGGTTGAGGCGGCTGGTATTCTTGTAAGGGTAATACTAGCGAGACCCAGGTTCCAC 

CCCCGGCXyvCACTTTTTAGTGTCAGATTACCGCCCATCATTTTAGCCAGGCTTGACGCAATAGTC^ 

CAATTCCTGTACCTTGCGAATTTGTGTCTGCTTGATAAAAAGCAGAAAAGATTTGAGACTGCTGCT^^ 

TTTOUVTCCCCCCACCGCTATCGCTAACCAGAAATATTAATTGTTCCTCACCAAGATTGAGCGCC^ 

GTATCCCTCCCCCCTCGGGAAAT 

9. p8G12 

sequence p8G12_l_I 

GGATAAGATCCCGGATAAGTATGTCAGGCTCGTATGCACAACAGGCATTATAAACCTCTAGACCATTTT 
TAACATGCTCTACTATTTTAAAATGAGGCCAGGGTAATAAGGCATTCATAATGCCGTTAATGATGATTT 
CATGATCGTCTACTAATAAGATCTTATATTCTTTCATTTGGCTGCCCTCGCGAAAATTAAGATAATATT 
AAGTAATGGTGTAGGTTGTGGAGATCATACGTATTTTCTGGCGTAAGTCGGTTAGTTCCTCCAGCGCGA 
TGATTTTCCCCATTTTTACGCGAT 

10. p9G4 

sequence p9G4_l_0 

TTCCATATTGCTCGTCCGGGGAGCGTGTTAATTCTTGATGATATACCAATGGATCTGCAATGGCGCAAG 
GTTCAACCATTACTTGGAGATATTCCCGGGTTATTGTACTGCGAGATTAGTCACTCTCATCAGTCTCAG 
GGGGGTGATGTTATTTCTGGGATAATAGAGCAACGGCGTTAGCAGGGGTCGGTCAGTAGTCACGGCCAA 
CTTCGGTGCACTTTTGCGTATCACTGGGGTATCATAACTGAATCTCATCCCCCCCACTTTGGTAATCAC 
AC 

sequence p9G4_l_U 

AATTCTTTTACCTCCATAAGCTGCGTGGCATAGCGATACAGAGTATTAAGCGGGTGTGTTACATCGTCA 
TCCAACAACATACGCAGCGAGCCGCCACGCCGGAAAAACCGCATCGTGTCATGTGCCTGTTGTAGGGTC 
GGGTCTTTTTTTCATGAGTACGTGTTCTGCGCTATCATACTGGAAATTTCCCCCCACTTACTGATAAGC 
CCTGTCAGTTGGGTAAGGACAGCGTTAAGCTCCTGAGACATTTTTTGAGTTGTTATCTGCCCCCCGACT 
CATAAGATCGGGTATTCCGCGGTGG 

11. p9B6 
sequence p9B6_l 

ATATCCCTAATGCTTTTCCTTAAAATAAATACCACGGAAGGATACTGGCCACCTAGCCAAATTTAGAAA 

GCAATGAACATCCGGTTTATTCCTGAAAACGATTACTCCGGCGCACGTTGTTCTGGCGTTACCTGAGCC 

AGCAAACGATATAATGGGGTGGTGACCCGCATACCGGTCATTGGCATCCaVTCCACACCGGAGGG^ 

AAACTCATTAGGCCATAGGTAATATCATTAAGACGCrCTAATAAATGAGGGTGGGGGGCCCAAACTA 

ACTCCAGTATGTATTGAGTCA 

12. p6G5 

sequence p6G5_2_I 

CCCATGGGCGCAATTTGTTGCGCAGCGTTTACCCGACCATCGCGTTTATGAGCTGTAATTCATGGC^^ 

TAAAAACGGGCGTGACGACCCCAACGGAAGATAAGGCCGGGCTTAAACAGGAGATTATTGCrAATGCGC 

AGCGCAAAGTGTTGCTGGCGGACAGCAGTAAGTATGGCGCGCATTCGCTCTTTAATGTGGTGCCGCT^ 

AGCGCTTTAATGACGTGATTACCGACGTCAATCTGCCGCCGTCAGCGCAGGTTGAACTGAAAGGGCGCG 

CTTTTTGCCCTAACG 
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DNA sequence of VGC II from centre to left hand end 

CTCCACAACCGAGCC^GCAGCAAATTAATTTTTTTCAACAATTGCTGAAACATCAAGCATCCACCACTAACCCCACTCCT 

I ; 1 ^ _ ♦ 80 

GACCTCTTGCCTCGCTCCTCCTTTAATTAAAAAMCTTCTTAACGACTTTCTAC7TCCTAGGTGGTCATTGCGGTCACCA 

4 LONRARSKLirLNNC'KMKMPPVTPVL- 
t> CRTtPGAN* rr'TIACR-SlMO'ROCr- 

c ACFSOtOlNFrCOLLKOtASTSNASA 



TTATTACCGCAGGTTATCTTGACCAGACAAATGGATTATATGCAGTTAACGCTACGCCTCGATTATCTTGCCAGAATATC 



31 



160 



MTAATGGCCTCCAATACAACTGCTCTCTTTACCTAATATACGTCAATTGCCATCCGCAGCTAATAGAACGCTCTTATAG 

vvRRLC' poKwr ics'R' As: i lpeyh 

ITACrVOQTNGLtAVMGRRR'-SCOMI 
LLPOVMLTROMOYMOLTVGVDtLARIS 



AcGGCGaVGCATCCCAAGCCCTTAATAAGCTGGATAACMCGCATGAAGGTTCATCCTATACTATTTCTTACTGTCCTTA 

161 * * * 210 

TqCCGCGTCCTACGCTTCGCGAATTATTCGACCTATTGTACCGTACTTCCAAGTAGCATATCATAAAGAATGACAGCAAT 

i G AACCALNKLDNMA-RriV'T FULSL 

C TAQHAKRLISWITWHCGSSYSISTCPY 

C RKSHPSA'-AG'HGMKVHRIvri. TVL'- 

CCTTCTTTCTTACGGCATGTGATGTGGATCTTTATCGCTCATTGCCACAAGATGAACCGAATCAAATGCTGGCATTACTT 

?41 ♦ -* ♦ 3?0 

GC.\s»GAAAGA.^TGCCGTACACTACACCTACAAATACCGAGTAACGCTCTTCTACTTCCCTTAGTTTACGACCGTAATGAA 

Start yscJ*? 

* RsrLRHv M w I r r A HCCKMKft:>:c wHTL- 
t vlsygm*cgslsliarr*sesnagity- 

z rrLTACDVDLYRSLPEDEANCMLALL 

atgc.^catcatattgatgcgaaaaaaaacaggaagaggatggtctaaccttacgtctcgagcagtcggcagtttattaa 

i 2 1 - 4 00 

tacgtcgtactataactacccttttttttgtccttctcctaccacattggaatgcacacctcgtcagccctcaaataatt 
stare yscjn 

a c s : : L M R K K T G rgwcnltcravcsllm- 

t> aasy'cekkoeeocvtlrveosavy^ 

c m0hmi0akknrkrmv*pyvsssrqf1k- 

tccggttgaccctacttagacttaacccttatccgcatagggcagtttacaacggcggataagatgtttccgcctaatca 

U'i - ♦ 4B0 

acgcca^ctccgatga.atctgaattgccaataggcgtatcccgtcaaatgttcccgcctattctacaaacgccgattagt 

a RLRULRLNCYPHRAVYMGG*CVSC*S 

b CC-GyLOLTVlRlCOfTTADKMrPANQ- 

c AVEAT*T*RLSA'GSLOR**t»CrRLIS- 

GTTAGTCGTATCACCCCAGGAACAACACCCAGAAGATTAATTTTTTAAAAGAACAAACAATTGAAGGAATGCTGACTCAG 

*b\ - 560 

CA.^TCACCATAGTGGGGTCCTTCrrCTCCCTCrrCTAATTAAAAAATTTTCTTGTTTCTTA.ACTTCCTTACGACTCACTC 

* VSGITPCRTGRRLIF'KNKELKEC'VR- 
b LVVSPOCEOAED»rrKRTKH»RNAESD- 
C 'WYMPRKNROKIMrtKEORItGMLSO 

ATGGAGGGCCGTGATTAATGCCAAAAGTCACCATTGCGCTACCGACTTATGATCACGGAACTAACCCTTCTCCGAGCTCA 

b6J - * * - * 640 

TACCTCCCCCCACTAATTACCGTTTTCACTCCTAACGCCATGCCTGAATACTACTCCCTTCATTCCGAACACGCTCGAGT 

« WRGV|NGKSDHCATDL**CK*RrSELS- 
b GGA'LMAKVTIALPTYOEGSNASPSS 

c megro»wok»plryrlmmr£vtllrao- 

GTTCCCCTATTTATAAAATATTCACCTCAGCTCAATATGGACCCCTTTCGGGTAAAAATTAAAGATTTAATAGAGATGTC 

64 I ♦ - 720 

CA.^CGCCATAAJ\TATTTTATAj^GTCGACTCCAGTTATACCrrCCGGAAAGCCCATTTTTAATTTCTAAATTATCTCTACAG 

CR I YK I rrSCOYGGLSGKN^ RFnRD'V 
VAvri KYSPQVNMEAFRVKIKOLI EMS - 
LPYL* M ! HLRS IWRPFG* KLKI' • RCO* 
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V.TCCCTCCGTTCCAATACA^.TAA^.ATTACTATCTTC^TGCAGCCTCCTCAATTCACAATCCT^CCTGACCTACCCCCC^ 

eoo 

•fTAGGGACCCAACGTTATCTCATTCTAATCATAGAACTACCTCCGACCACTTAAGTCTTACCATCGACTGCATGCCCGCT 

3 NFWVaIo* 0 * YLDAAC' lONGS'RTRt- 

u irr. LOtSKisiLMOP'^crRHVAOvFAf.. 

c s;.cCNTVHLVS'CSLLNSEW'LTYPR 

CACAPACATTCTGGATTATGGACCTTATCAACGCCAATAAACGCAACCTGGTGAACTCCTTGATCAAATACCCTTATCCC 

801 - ♦ * 

CTCTTTCTAAGACCTAATACCTGCAATAGTTGCGGTTATTTCCCTTCCACCACTTCACCAACTACTTTATGGCAATAGCC 



i TNll, DYGRYORO* RCC .CtVVOCIPLS v - 

Q'rWlMOVlNAriKGKVVKWLMKYPYP 
n KH SGLWTLSTriKGRW-SG--NTl.lR- 

Tn insertion PllHll 
0 

TTGATGTTATCGTTGACAGGACTGTTATTAGGAGTCCGCATCCTGATCGCCTATTTTTGCCTGAGACGCCGTTTTTGACC 

iB\ 960 

AACTAC.->.TAGCAACTGTCCTGACAATAATCCTCACCCGTAGGACTACCCGATAA.^AACCGACTCTCCGCCAAA^ACTCC 

a OVIVDRTVIRSGHPORLFLPETPFLS 
b LKLSLTGLL LCVCILtGtfCLRRRr-A- 

.CiR'OOCY-tWAS'SAlFA-DAvrCf- 
CCACCTGATCCCGAGGTGTTGCAACTTTATCCTTATTTCTGGCAACCTGCTCGTTACCCTGTACCGGAATCCCTGGAT.V 

9tl ♦ ♦ lOlQ 

GCTGGACTAGGCCTCCACAACGTTGAAATAGCAATAAAGACCGTTGGACGAGCA^.TGCGACATGGCCTTACCGACCTAT- 

j RpCPEVLOLYR VrWQPARYA. PEWLOK- 

t p L ! P R C C N r ! \' : -'^ G N L L V I. I R N G v; ! 5 - 

7' ^RCVATLSLFLATCSI. hCTGMAC* 

Tn inseilion PllDlO 



GCTGGGCTTTCATCTTCAAACTGCTCGCCTTATGGCGATCCGCCCGACTTGGATCCTCTTCTTGACAGAGCGTTAAATAG 
cVAVcVc'CsliWrTAGAAGTTTGACG 



t tC'-HtOTAGVMAlGPSHlvrLTER-ID 
t. w A r : F K L L A L W R S A R V C S S f ' 0 S V K • 

r aclsssncwrvgorpelorli-dralnp 
acta.agaggaagctctgttattccagcctctttaaatgacaggcaaaaacggcaggttcctcttgcgccgcgtatatcg3 

* " ' VgVtVctVcV^^^ 



T K J:. %• 1 "c t S S L r K • 0 A K T A G r, 5 r A A Y 10 
LRGSSVI PACLNOROKRO V^: .-PRlSi 

CArrTGCCTTTCGGCTCGGArrATTCAAACTCAGCTGTAGTGACTATTTTATGCTACCACAGTATCCCCAATTCCTTCTA 
GTAAACGGA 



- 1?8C 

Vaaa'cVcVaccctaataagtttgag 



start IcrC^ 

a MLPLGWDYSNSCVVTILCtQriGNCFY. 
t jCLWAGlIOTOV-'tFYATR -'SAIAST- 
c rArCLGLFKLRCSOYr tl_J ? i \ ^ 0 L L L - 

CAGTGCTTTAGCGAGGATGAGATCTCGCAGCTATATGGTTGGTTGGGCCAAAGAGATGGCWTTACTTCCTCCGCAACT 

GTCACCA^ATCGCTCCTACTCTAGACCCTCGATATACCAACCAACCCCGTTTCTCTACCCTTTAATGAAGGAGCCGTTCA 

SGLARMRSGStHVGWGKEMANrrLRK*. 
b vv-RG^OLAAIwLVGAKRWOiTSSAS 

0 W F S C D E I W 0 L Y ' ; . H L G 0 R 0 . . L P . P 0 • 

GATGCAACAAACTGCATTCCACATCCGTACCCCCATTCTTAATCCGGAAGCGCATGACGATCCGGCTTTTACATGCGCTA 

nei - * 

CTACGTTCTTTGACCTAACCTCTAGCCATCGCGGTAAGAATTAGCCCTTCGCGTACTGCT.iCr.CCCAAAATGTACGCGA- 

C N N L H C k S V P r I L I G K R M T >: V- t. H A I. - 

L 0 A T N C I A 0 H Y R H S • S G S A • P. C 0 F Y M R Y * 

c mootaloigtailnreah ocacftca:- 
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TTftCTATTATTACCCCCTCCCCACCGTATACTTTCCCCCAAGACTTCTCTTACCCACATTATCTTCATCC^XXATTTCCT 

uu * 

AATCATAATAATCCCGGACr.CCTCGCATATCAAACCGCCTTCTGAAGAGAATOCCTCTAATAGAACTACCTCCTAAACGA 

, L * V L t P P P 0 « 1 L W P K T S L T F. I I f K C H L L - 

(, .yYt^LkSVYrGSRt. LL. P^^Ll-SKSICY- 

SilTt^SAAVTLAGDrSyRD^LMGAFA 



ATGAGTTTTACTTCACTTCCTCTGACGCAAATTAACCATAAGCTACCCGCTCCAAATATTATTGAGTCACAGTCGATAAC 

♦ 

TACTCAAAATGAAGTGAAGGAGACTGCCTTTAATTGCTATTCGATGGGCGAGCTTTATAATAACTCACTGTCACCTATTG 

. VLLHfL* RKLTlSYP tCIl. LSHSG- H- 
C r Y r T 3 S 0 G N • P • A T R S K Y T • V T V 0 H - 
„5rTSLPtT£lNMKLPABNl lESOWIT- 

ATTACAATTA;^CTTTATTTCCCCAAGAGCAACAAGCTAAGACAGTTTCACATCCTATTCTGACCTCCCCT7ACCGTAACG 

,601 - ' 

TAATCTTA^TTGAAe.TAAACCCGTTCTCGTTGTTCGATTCTCTCAAAGTGTACGATAACACTCGAGGCGA--GGCATTCC 

YN-l .uRKSNKLREFHMLL-APL-VR - 



. .. rrCARATS-CSTTCYCELRLP-G 
\ 0 \. ' L r A 0 E 0 0 A K R V S H A I V S S t S K A 

CTGA.vjvA^TCATCCGAGACGCCTATCGTTATCAGCGTGAACAGAAACTTGAGCAGC.^ACA.^CAACTAGCGTCCTTCCGT 

GACTTTTTTAGTAGCCTCTCCGGATAGCAATAGTCGCACTTGTCTTTCAACTCGTCGTTGTTCTTCATCGCACGAACGCA 



L K K S S E T P I V I S V N R K L S S rJ K N ■ 3 r C V 
, KNMt-^R .LSLSA' TES - a:.T"TS'. LA* 
rv, ;sOAYRtOREOKVEOQ''.'ELACLR 



AAA^i^T ACGCT GG AAAA^AT GGAACTCG AAT GGCT GG AACAGCATGT AAAACATT T ACA^C ACG.-.TG AW.TC.\«»TTTCC 

17^1 ' »^ 

TTTTTATCCG.-.CCTT7TTTACCTTCACCTTACCCACCTTGTCCTACATTTTGTAAATCTTCTGCTACTTTTAGTTAAAGC 

J X'R WK KWKWNGWNSM'NlYKTMJCiNrv- 

t KY^-GhNGSGMAGTACKTrTRR-KSlS 

j;^-I^rKMEVEWLE.0HVKHLO[>0£N0FR- 



TTCATTGGTCGATCACGCAGCGCATCATATTAAAAATAGTATAGAACAGGTTCTCTTGGCCTCGTTCGACCAACAGTCGC 
^AGTAiiCCAGCTAGTCCGTCCCGTAGTATAATTTTTATCATATCTTCTCCAAGACAACCGGACCAA.GCTGGTTGTCAGCC 



J ^ >; c • T g R I I L K I V • N R r C W P G S T ri S R 

O r 'lGRSRSASY - K'YRTCSVGLVRPTVG- 

r^ ;r>.. aAHH 1 KNS lEOVL'-AWrOQOSV. 
TAaACACTCTTATCTGCCATCGTCTGCCACCCCACCCCACGGCTATCGCGGAAGACGGAGCGCTTTATTTGCCTATTCAT 

- * ?000 

.;,TCTGTCACAATACACGGTACCAGACCCTGCGGTCCCGTCCCGATACCCCCTTCTCCCTCCCGA.iATAAACGCATAAGTA 

J -7vLCAIVWHARPRLMRK^ E- r:c .ri- 

RQr : VPSSGTPCHCYGG«CS-LrA: SS- 
c os vMCHRLAROATAMAEEGALtLKIh 

CCTCAA-AAAGAGCCATTGATCCGAGAAACTTTTCCCAACCGGTTTACGTTGATTATCCACCCTCCTTTCTCTCCCGATCA 

JOOI ♦ - * 2080 

GGACTTTTTCTCCGTAACTACGCrrCTTT6AAAACCCTTCGCCAAATGCAACTAATAGCTCGGACC^.aAGAGACGGCTAGT 

L K ^ R H ' C E K L L A S G L R • L S 5 L '/ S L ? \ R - 
t .KKOlOARNrwOAVYVOYRAWrLSSS 

PEKEALMRETrCKRPTLI lEPCrSPOQ- 
CGCTGAACTTTCCTCAACACGATATCCCCTTCAATTTTCACTTTCTCCTCATTTCAACGCCTTACTGAAATGCTTACCTA 

2001 - * ?J60 

CCCACTTGAAACGACTTGTGCTATACGCCAACTTAAAAGTGAAAGACCAGTAAAGTTGCGCAATGACTTTACCVVTGCAT 

J LH|C*QHOMPLNFHrL*'ISTP>*NGYV 
t> G'trLHTICR* irirSSrQR-. TEHVT* 

c AEI. i^STRYAVErSLr^Ri. KNALLK W i^N- 

ATGCTGAAGATA.A.AACAGCTAGCCATGAATATTAAAATTAATGAGATAAAAATGACGCCCCCTACAGCATTTACCCCTGr. 

2161 ?2<0 

TACCACTTCTATTTTCTCCATCGCTACTTATAATTTTAATTACTCTATTTTTACTGCGGGGGATCTCGTAAATGGGGACC 

« MVP. If. EVAMNI^:IriC:KHTl'^TA^ TPt»• 

b W'p'KR'R* II. KLMK•K•RPLOHLPLA- 

c c E 0 K H c s 0 E r • M • • r. K M o ,^ p r r» I Y p w 
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CCACGTTATACACCAACAAGAGGTTATTTCCCCTTCAATCTTACCTCTCCACGAGTTACAGCAAACCACCCCCCCACCCC 

??4| ^J'U 

GCTCC-VT'^TCTCCTTGTTCTCCAATAAACCGGAACTTACAATCCACACCTCCTCAATGTCCTTTGCTCCCCCCGTCCCG 

QV Z e'oEV I S P SM LALOCLOCTTGAA L- 

- HNKKLFRLOC' LSRSYRKRRCOP 
PC- ACTRGY rAFNVSSPCVTCNOCCSA - 

TCTATGACACGATGGAAGAAATAGGAATGGCGCTCAGTCCTAAACTGCCCCAAAATTATAAATTCACTGATCCTCACAAA 

23?1 ♦ 2*00 

ACATiCTCTCCTACCTTCTTTATCCTTACCGCGACTCACCATTTGACGCCCTTTTAATATTTAACTCACTACGACTCTTT 

YLTMECIGMALSCKLRCNYKFTOAEK 
SM^^WKK* EWR» VVNCAKI InSLMLRN - 
L - rCGRNRNGAEW* TARKL - !H - C* ET- 



CTCCAGCOCAGACAGCAGGCTTTGCTCCCTTTGATAAAACAAATACAGGACCATAATCCCGCAACCTTCCCTCCCCTTAC 



?401 



CACCTCCrC-CTGTCCTCCCAAACGACCCAAACTATTTTCTTTATGTCCTCCTATTACCCCCTTCCAACGCACGCCAATG 

LLRLIKOIOEONGATLRPLT - 
WS- . SRLCCV* • NKYRRIHCORCVRLP- 
Ci CT-GFAArDKTNTGG-WCr'VASAt 

CGAAG.-.G.-.-.rACTGATCCTGATTTACAGAATGCCTATCAAATTATCCCTCTTGCA.ATGGCGCTTACTCCCGCCGCGTTGT 

:<8i ♦ ♦ ♦ * " " 

GCTTCTC?'ATCACTACGACTAAATGTCTTACGCATAGTTTAATAGCCAGAACGTTACCGCC\ATGACGGCCCCCCAACA 

II ;^ < C f D LQNA T Q 1 I AL AMA LTA O Z L 5 
KO'v: LI YRMRlKLSLLOWRLLr AC C 
R R r . . 5 . r T E r V S N y R S C N G A Y C k 3 V V 
C AAAA-^AGA.AAA.VKCG CG ATTTGCAATCGCAACTGGAT ACGT TACAGCGGACCAGCGATCCGAACT TG CCGTTT 

?b61 ?6«0 

GTTTTTTCTTTTTTGCGCTAAACGTTAGCGTTCACCTATGCAATGTCGCCTCCTCCCTACCCTTCAACCGCAAAAATCAA 

KKKS ROLOSOLOTLORRRDGNLPrLV 
OKA i. NA I CN RNHI RYSGGGmGTCR T * f - 
KKr:.:TRrAIATGYVTAECGWLLAVrSL- 

TACTGGA.ArTTGGCGAAGTGGATACCGTACGCTGTCCTCTCTGAAGCCTTTTATCCAA.CAGGCGATACACAACGATGAAA 

?6<! ♦ ♦ 7t70 

ATGACC77n.-.-.CCCCTTCACCTATCGCATGCGACAGGAGAGACTTCGCAAAATACGTTGTCCGCTATCTCTTGCTACTTT 

YWMLAKWl pyAVLSEArYATGr>RCR' N- 
TG- wiccYRTLSSLKRrMOO*- ! i>NOE>f- 

LL- ti. EvDTVRCPL' SVLCN^c - TTMK 
TGCCCTTA7CCCAGTCGTTCACACCCGTCCCAGACTGGCCCCATCGCTCTGAACGCGTCCGTATTTTGCTAAGAGCAGTA 

ACGGC A.^7 - CCGTCACCAAGTCTCCCCACCCTCTG ACCGGCCTAGCCACACTTGCCCAGGC AT AAAACGATTCTCGTCAT 

AL; -V'VgTRGR LAGSL' TGF^ r FAKSSS- 
PL£,?HrRRVADWPORCERV ?i:LLRAV 
CP : = SGSDAWQTCRlAVNGSv rC' t Q * 

G CCTTTG A;:.CT TACCATATGCATCGAACCCTCGGACCAAACTCGTrrCCCCCCAGCATTAC TACCTTTGCCTCGTTTCCT 

2801 2880 

CGa^AACTTCi^TCGTATACGTACCTTCGGAGCCTCGTTTCAGCAAACCGCCCTCGTAATCATCCAAACCCAGCA^ 

• 7 - MMHRTLGAKSrCRSlSTFASFA 
AfELSIClEPSEOSRLAAALV^tLRRLl 
PL"LA>ASMPRSKVVHPOM' YVCvvCC- 
GTTATTCCTTGGCCTTGAAAAACACTGCCACCGTGACCACTCGATTTGCCACTTCCCCCCTAATACATTACTCCCCCTAC 



2B8I 



?960 



CAATAACGA.ACCCGAACTTTTTCTCACGGTCCCACTCCTCACCTAAACCGTCAACGGCGGATTATGTAATGACCGCGATG 



VipwP - KRVPA»CVDLPVAA' YITAAT 
L.-i GLLKECORCCHlCOLPPNTLLPLt- 
vsL'^LKKSASVRSGFAS'CRtrMYCRY 

TACTCGATATTATTTCTGAGCGCTCCCTTTTCAGTGATTCGTTGCTTCATACACrTACCCCTATACTTTCTTCATCGAAG 

;'96J ♦ - 30«0 

ATGAGCTAT.V^T.\^ACACTCGCCACCGAAAAGTCACTAACCAACCAACTATCTGAATCGCCATATCAAAGAAGTACCTTC 

rFr> > : ' ^LAFO* LVA* • TYRy SFFI EO- 
LDl JCERWLrSOWLLORLTAI VSSSK 
TSILFVSAGFSVIGCLIOLP L- FLMRR 
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wo 96/17951 PCr/GB95A)2875 

20/39 

ATCTTCAATCGCTTACTCCAACAACTTGATCCGCACTTTATGCTGATACCCGATAACTCTTTTA.ACCACGA^CATCAACO 

:j4i 3»?c 

■^ACAACTTAGCCAATGACCTTGTTGAACTACCCGTCAAATACGACTATCGCCTATTCACA.^AATTCCTGCTTCTACTTCC 
J V0SVTPTT»CA.VYAOTR'!-r'R«fiST 

CSICYSNNLMRSLC-YPlTVLTTKiHV- 
TCAACAAATTCTCGAAACCCTTCGTGAAGTAAACATAMTCACCmTATTCTGATACCTCCCTTTCAATATTTAC^ 



3121 



ACTTCTTTAACACCTTTGCGAACCACTTCATTTCTATTTAGTCCAAAATAACACTATGCACCGAAACTTATAAATCCATT 

► TNSRNAS* SKDKSGriLIPCrOYLGK 
EQILCTLREVKINOVLF* YLAFnI - vn 
NKF SKRFVK* IRFYSDTHLSIFR' 



3200 



ATTCGCTTTCTGGCTCATCATGAGGCGTCACCATGGATTOGGATCTCATTACTGAACCTAATATTCACCrTTTTATTCAA 

taaccgaaagaccgagtagtactccgcagtcctacctaaccctacagtaatgacttccattataactcgaa.'>^taact- 



3€l 



L a F W L I M R R 0 D : L G S H Y • T • : S A T V S I 

WLSGSS* CVRMDWDLITERNIOLFIO 
IGFLAHHCASGWIGISLLNVI FSF LFN 

ttagcaggattagctcaacggcctttagcaaccaatatgttctggcggcaaggacaatatcaaactatcataacgctcct 

AATCGTCCTAATCGACTTGCCGCAAATCGTTGGTTATACAACACCGCCGTTCCTGTTATACTTTCATAGTATTGCCAGC-i 
S P I S • T A F S N 0 i V L A A R T : • r: - m N G R 

.aglaerplatnmfwrqgoi I 1 rvv 

, ^0* UNGL - OPICSCGiCDNMKLS - RS V 
ATTCTCTTATGTCAGATACTCAAGCAAACCTTCTTAGACGAAGAACTGCTTTTTAA.AGCGTTGGCTAACTGG.2.AACCCGC 

taagagaatacagtctatcagttcgtttggaagaatctccttcttgacgaaaaatttcccaa.cccattgacctttggccg 



3360 



3M0 



3520 



J ILLCOI tKQTFLOCCLLFKAL AMKKPA 

b FS YVRYSSKPS-ThNCFLKRWuTOMPC; 

c slmsotoanllrrrtaf*svc»l:tr 

AGCGTTCCACGGTATTCCTCAACGATTArrTTTGTTCCGCGATCCCCTTCCAATGAGTTCTTCTCCACCTCTTTCCAGCT 

tcccaagctcccataaccagttgctaataaaaacaacgccctacccgaaccttactcaacaagacctccagaaj«;gtcga 

i AFQCIPORLFLLRDGLAMSCSPPLSSS- 
tj RSRVrLNOYFCCAMGL0*VVl.HLr?A 

SVFCYSSTl IFVAR WACNEtFiTSrOL- 

CCGCCGACCTCTGGTTACGATTACATCATCCACAAATAAAATrrCXTCCACTCCCAATCCGTTCATGGTTACGTGACCCA 

i-.-. . , ^ ♦ ♦ ♦ . V60 

SGCGGCTCGACACCAATGCTAATGTACTACCTGTTTATTTTAAAGXACCTCAGCCTTACCCA'KCTACCAATCCACTCCCT 

t ACLWLRLH HROlXF?CVAMRSWi GEC 

b f PSSGYDYlIOK'Nr?ESOCVHG* VRt:- 

KRALVTITSSTNKlSWSRNAFMVfr-GS. 



GTCACCGCGCAACAGTCGCTCACTGTATCCGCCGGTCGGCAGGATATCGTTCTCGCCACGCTGTTATTAATCGCTATTGT 
CACTCCCCCGTTCTCACCCAGTCACATACGCGCCCACCCCTCCTATACCAACACCGCTCCCACAATAATTACCGATAACA 



scart icr£>' 

a VRAOOWLSVCACBOO M Y L ^ 1— !£ L U : A I V 

C SGRNSG SVYARVGRIWFWRRCr- ? ..L' 

c .'iCATVAOCMRGSAGYGSCOGVlNcyc 



3ATCATGCTGTTACCCTTGCCCACCTGGATGCTTCATATCCTCATTACTATCAACCTTATGTTTTCACTGATCCTGCTCT 
rTACTACGACAATGCoAACGCCTGGACCTACCAACTATAGCACTAATCATAGTTGCAATACAAAAGTCACTAGGACGAGA 



4 MMLLPLPTWHVCILITINLMFS VILLI 

b .cCYPCRPCWLIS'LLSTLCFO-.-CS 
c CO- AVTLAO LDG' yPOYVOPYVrSF. rAL 
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SUBSTITUTE SHEET (RULE 26) 



wo 96/17951 PCr/GB95/02875 

21/39 

TWTCCTATTTATCTTACTGACCCTCTCCATTTATCGCTATTTCCCTCTTTATTACTTATTACTACATT/iTATCGTTTr. 

i76l J8<, 

ATTAACr,ATA.iJVTAGAATCACTCGGACACCTAAATACCCATAu^ACGCAGAAAT^,>TCA.nTA^TCATCTAATA.TiGW 

d lAirUSDPI. OLSVrPSLLLITTL : :»L 

b i. r:LVTL;;VRTFRl.YVLLLHY;v-, 

C NC YLS"' PSftFIGJSVriTYYYlJSr.-- 

TCACTCACAATCAGCACATCACGGCTCCTACTCTTACAACATAATGCCCCTAATATTCTCCATCCTTTCCGTAAGTTTCT 

mi ♦ - 3920 

ACTGAGTCTTACTCGTGTAGTCCCCACCATGACAATCTTCTATTACGCCCATTATAACACCTACGAAACCCATTCAAACA 

* SLTISTSRLVLLOHNAGNlVOAFGKrv- 
b HSQSAHHGWYCYNIMPVILWMLSVSLS- 
c THNOHlTAGTVTT-CR*YCGCrR- /C 

CGTACGACCA.iATCTCACCGTTGGCTTGCTCCTATTTACCATCATTACTATCCTCCAATTTATTCTCATTACAAAAGGTA. 

3921 • ♦ 400C 

CCATCCTC^TTTAGAGTCCCAACCCAA.CCAGCATAAATGGTAGTAATGATAGCACGTTA.^ATAACACTAATGTTTTCCAT 

i veer. LTVGL VvrTIITIVOriVITKGI- 

t> -CC:3PLGW5 :'LPSLLSCNLL3L0 < v 

C RRftv:3HRWVGhIYHHYYRA|YCHY:<RY- 

TCGACAGGGTOGCCGAAGTTAGCGCACnTTTCTCCCTTGATGGGATGCCAGGCAAACA.\MGACTATCGATGGCGATTT:- 

<001 • ♦ ♦ * ^080 

AGCTCTCCCArCGCCTTCAATCGCGTCCAAAGAGCGAACTACCCTACGGTCCCTTTGTTTACTCATAGCTACCGrTA^.: 

i E R v 1 C V S A r L 0 G M P G K 0 H S I 0 G 0 L 

b S R G U - K L A h S L M G C 0 A H K • V 3 M A : t - 

C RECG GS'RTrL .- 'WDAROTNE YR W^ri - 

7n insertion P2D6 

u 

CGTGCCGG.^GTTATCCATCCACACCATGCCCGTACATTAAGACAGCATGTCCACCAGGAAAGCCGCTTTCTCGGTCCGAT 

40B1 ♦ ♦ ♦ • ♦ 4160 

GCACGGCCTC;^ATACCTACGTCTCGTACGGGCATGTAATTCTCTCGTACAGGTCGTCCTTTCGGCGAAAGAGCCACGCTA 

a RACV:DADHARTLROHVOOESRrLGAM- 
b VPEL5H0TMPVH* osmssrkaafsvr ;;- 

c CRSrRCRPCPYlKTACPAGKPLSRCO 

GGACGGTGCGiTGAAATTTGTTAAAGCCGATACGATTCCCGGTATTATTGTTCTTCTCGTGAACATTATCGGCGGTATC-. 

4 161 - i:«C 

CCTGCCACGC:ACTTTA^ACAATTTCCGCTATGCTAACCGCCATAATAACAACAAGACCACTTGTA.ATACCCGCCATAG- 

a dga>'-:kvkgot]aciivvlvni ig 3i:- 

b T V (i ' L L K A I R L P V L L L F W • T 1 3 . S 

c GRCrrtC'RPYrCRyYCCSGtHiP-. rh- 
TTATCGCT i TCGTAC AATATGATATGTCGATGACTGAGGCTCTTCACACTTATACCGTACTGTC^^TCGGAGATGGTT T - 
424 1 ♦ ♦ • ♦ - -32C 

aatacccatagcatgttatactatacacctactcactccgacaagtgtgaatatcgcatgacacttagcctctaccaaa: 

• lAI-C^OMSMSEAVHTYSVi. SIGC3L 

b LSL: :MMICR»VRLrTLIAYC C»S£v\\- 

c YRY^71*YVDE-CCSHL*RTVNRF WF."- 

TGTGGGCAVTTCCATCCCTCCrrGATTTCCCTTACCGCGGGAATTATTCTCACCCCTGTCCCCCGTCAG^.-ACGCCA^^ 

021 - 4*0C 

ACACCCGTTTA.'^GGTACCGACGACTAAAGGGAATCCCGCCCTTAATAACAGTCGGCACACGGCCCACTCTTTGCGGTCTT 

a CGQliSLLISLSAGIIVTRvPC^KAO*-- 
b VGK FrtRC' FPLARtLLSPV3R\'Rf. R7- 

c WANSIAADFP'RCNYCHPCPG'CTPE 

CCTGGCGACAGAGTTGAGTTCTCAAATTCCCAGACAACCTCACTCGCTCATATTAACCGCTGTCGTTTTAATCCTCCTCG 

4401 ♦ ^<PC 

GGACCGCTG7CTCAACTCAAGAGTTTAACGGTCTCTTCGAGTCAGCGAGTATAATTGGCGACACCAAAA7TACGAGGAGC 

A L A T L L n i 0 : A R 0 P 0 S L I L T A V . I K I - 

b H R c 5= • ■•' I. - P 0 ^ *- s R s Y • F I w r • r s s 

c p 0 0 R V r r 5 N V C* T T 5 V A H 1 . N P C G F r. - r P - 
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4461 



22/39 

CTTT.^ATTCCTCGCTTTCCTTTTATCACTCTCGCTTTCTTTTCAGCCTTGTTACCATTCCv^;v^TTATCCTCATTCCCCGC 
OAAATT AAC CACCGAAACGAAAATAGTGAC AGCGAAAGAAAACTCGCAACAATCGT A^CCGTT.V.T ACGAGT AAGCGGC.- 



4!>6u 



* LiPGrpriTLArrsALLALr:: L:RK 

O I ' riAFtLSLSLSrORC - hCOI. SSTAA- 

i: rNSWLSrYHSRFLrsvvs iAH iPHSPo'- 

Tn insertion PnC3 

0 

AAAAACTCTGTGCTTTCCGCAAATGGCCTCGAAGCACCCCAAAAAGATACTATCCTTCCCGGCGCATCTCCTCTAATCTT 

- * 4640 

TTTTTCACACACCAAAGCCGTTTACCCCAGCTTCGTCGCCTTTTTCTATCATACCAAGCCCCCCGTACAGCACATTAGA.^ 

* KKSVVSANCVEAPEKOSMVPGACPLtL- 
O KSLWrPQMASKHfiKKIVWrPAHVL^SY- 
*r KVCCrRKWRRSTGKR-yGSRRMSSNL 

ACGTCTTAGCCCCACCTTACATTCTGCCGACCTGATTCCTGATATTGACGCCATCAGATGGTTTTTATTTCAGGATACCG 

♦ 41?C 

TGCAGAATCGGGCTGCAATGTAAGACGGCTGGACTAAGCACTATAACTGCGGTACTCTACCAAi»AMAAACTCCTATGCC 

■» RLSPTLHSADLIRDIOAMRwrLFtC'T O- 

VUARRYlLPT'FVlLTp-DCrtLRlF 
TS' PDVirCRPDS* y*RHEKVr!' 3YP- 

GCGTCCCTCTCCCTGAGGTGAATATTGAGGTTTTGCCTGAACCCACCGAAAAATTGACGGTACTGCTATATCAGGAACC' 

^ 4800 

CCCAGGGAGAGCGACTCCACTTATAACTCCAAAACGGACTTCGGTGCCTTTTTAACTGCCATGACCATATAGTCCTTGGG 

■> VFLFtVNiEVLPtPTtKLT /LL. r OEF 

b ASL5LP*lLRrCLNPPKN«RYC* lRNP- 

C RPSP* CCY» GFA* thr kidctaisgtr- 

GTATTTAGTTTATCTATTCCCGCTCACGCGGATTATTTATTCATAGGCGCGGACGCTACTGTGGTGCCTCACAGCCACAC 
4B0J *■ ♦ ♦ — • 4B80 

CATAAATCAAATAGATAAGGGCGAGTCCGCCTAATAAATAACTATCCCCGCCTGCGATCACACCACCCACTCTCGGTCTG 
a vrSLSrPAOAOYLLlGAOASVVGDSQT- 

t> ylvylfplrriiy-'artlvkwvtarr- 

c i'riYS RSGGLriDRRGR'CCG'CPo 

GTTACCGAACCGGATGGCGCAGATCTGTTCGCTTACAAAAGACATGGCCCATAAGGCGCAAGCTTTTCCACTGGACCTTr 
4801 . • ♦ • ♦ • ,9^(, 

CAATGGCTTGCCCTACCCCGTCTAGACAACCGAATCTTTTCTGTACCGCCTATTCCGCGT7CCAAA.ACCTCACCTCCAA=i 

* LPNGMGOICHLTKOMAHKAOGrCLr vr- 
b YRTGWGRSVCLOKTHplRRKVLD Wrr 

c V7ERDCA0Li.AYKRHCP*GASrWTr'?*r- 

TCCCCCGCAGCCAACGTATCTCTCCCTTATTAAAATCTGTCCTCCTTCCCCATATCCCAGAGTTTATTCCTCTTCACGA^ 

- ♦ • S04L 

AGCCCCCCTCCGTTGCATAGAGACCCAATAATTTTACACAGGACCAAGCCGTATACCCTCTCAAlKTAACCACAACTCCrr 

* AGSORISALLKCVLLRHMGEFlCV^t 

b SRAANVSLPY'MVSCrCIWCSLLvrRK- 
C WGOPTYLCLIKMCPASAYGRVYWrsCN- 

ACCCCTTATCTAATCAATGCGATCCAAAAAAACTACTCTGAGCTGGTGAAAGAGCTTCACCOCCAGTTACCCATTAAT.V 

^►0«» * - - M30 

TGCGCAATAGATTACTTACGCTACCTTTTTTTGATGAGACTCGACCACTTTCTCGAACTCGCGGTCAATCGC7AATTAT7 

a 7RYLMNAMCKNYStLVKEL0RC. Lf!N^ - 

t> RVI**MRWKKT7LSW-KSrSASYr--IK- 
c ALSWCCDGKKLL'AGERASAPVTM*' 

AA7CGCTCAAACTrrCCAACGGCTTGTATCAGAGCGGG77TC7AT7ACAGATTTACCTCTTATTTTCGCCACC77AAT7v: 

S20C 

TTAGCCACT77GAAACGT7CCCGAACATACTCTCGCCCAAAGA7AA7CTCTAAATGCAGAA7AAAACCCC7CGA.^TTAAC 

* lAETLORLVSERVSIRDLRLr rGTLrD- 
b SLKLCNGLYOSGFLLElYVLrSAP-:. 

C N K • N F A 7 A C : H A C r Y • f F 7 : - K h ri • • 
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23/39 

ACTCGGCCCCACGTCAAWVACATGTCCTGATGTTCACACAATATGTCCCTATCCCGCTTCGTCGTCATATTCTGCGTCCT 

i:oi brfir 

?GACCCGCGCTCCACTTTTTCTACACCACTAC;>ACTCTCTTATACACCCATACCGCGAAGCACCACTATAAGA.CCCAGCA 
a WAPREKOVLMr. Tt f VRIALRRHlLftR 

e T c R M V K- K M s ' c • 0 N K s V s K r V V I r c v - 

c LCAT'KRCHDVORICPYRASSSYSASS- 

CTTAATCCCGAACCAAAACCGCTCCCGATTTTCCCCATCCGCCAAGGTATTGAAAACCTCCTCCCTGAATCCATTCGCCA 

* S360 

GAATTAGGCCTTCCTTTTGGCGACCGCTAAAACGCCTACCCCCTTCCATAACTTTTGGAGCACGCACTTACGT.AACCCGT 

a LNPEGKPLPILPICECIErJLVRtS-RO- 
t LIRKCNRCRrCGSAKVLKTSCVNp- AR- 

C •SGRKTAAOrADRPRY-KpRA-lHSP 

GACGGCAATGGCGACCTATACTCCCCTGTCGTCTCGTCATAAGACGCAGATCCTCCAACTTATCGAGCACGCCCTGAA.GC 

b36l ♦ * b<40 

CTGCCGTTACCCCTCCATATCACGCCACACCAGAGCACTATTCTGCCTCTAGGACGTTGAATAGCTCGTCCCCGArrTCC 

i TAMCTYTALSSBHKTQILOLIEOALKO- 

e ROwr»P!LRCRLVrPRRSCNLSSRR*S 

c OGWGDLYCAVVSS' OADPATYRAGASA - 

actcagcca.^ttattcattgtcacttctgtcgacacccgacgtttcttgcgaaaaattacagaagccacct7:ttcgac 

^>«t41 * 5b?0 

TCAGTCCGTTTAATAA.CTAACAGTGAAGACACCTCTGGGCTGCAAAGAACGCTTTTTAATGTCTTCGGTCCA.iC-AGC-:- 

J SAy. LriVTS^'CTRRrLRKlTEAT-fC 

o SOPrJiSLSLLSTPDVSCF. KLOKPP CST- 

c vsoi ihchrcr hptrlaknyrshl . rr- 

gtaccgattttctcatggcaggaattaggaca'ggagagccttatacaagtggtagaaagtattgaccttaccg.-.=.gacg.-. 

5600 

catccctaaaacagtaccgtccttaatcctctcctctcccaatatcttcaccatctttcataactggaatcgcttctcct 

a VPtLSWOELGCeSLIOVVtSlDLSEEE- 
fc YRrCHCRN* ERRALYKW' KVLTLASRS- 

c TOrVMACIRRGEPYTSGRKY'P'RRG 

GTTGGCGGACAATGAAGAATGAATTGATGCAACGTCTCACCCTGAAATATCCGCCCCCCGATGCTTATTCTCGATGGGGC 

£680 

CAACCCCCTCTTACTTCTTACTTAACTACGTTCCAGACTCCGACTTTATAGCCGGGCGCCTACCAATAACAGCTACCCCG 

end icrO* stare yscN*7 

J LA D fi t f * I 0AT3EAE I SAPRKLLSrrC?*- 

b WPTMKNEL K O q L R L KYPPPOCYCRWG 

VGGC'RMN»CNV'»G*NrRPPMVIvr;Gi- 

CGAATTCAGGATGTCACCGCAACGTTGTTAAATGCGTCCTTGCCTCCCGTATTTATGCGCGAGTTCTGC7CTAT.-AAGC- 

SCBi - 376, 

GCTTAAG7CC7ACAGTCCCGTTGCAACAATTTACGCACCAACCGACCCCATAAATACCCGCTCAACACGACATA77TCCG 

a NSGCORNVVKCVVAWGIYGRVVL^SA 

b RIQDVSATLLNAWLPGVFMGELCC-Vr- 

c ErRMSAORC*MKGCLGYLWA. SCAV-s:.- 

TGCACAAGAACTTGCTGAACTCGTCCCCATTAATCGCAGCAAAGCTTTGCTATCTCCTTTTACGAGTACAATCCGGCTTC 

S761 ♦ ♦ ♦ ♦ ♦ ♦ — - ♦ sejo 

ACCTCTTCTTGAACCACTTCAGCACCCCTAATTACCGTCCTTTCGAAACGATAGAGGAAAATCCTCATGTTAGCCCGAAG 

yscN' 

* WRRTC»SRGD' w 0 o <; r A I s r ^ e y k r a s • 

b GEELAEVVClNGSKALLSPrTSTIGLh- 
c EKNLLKSWGL^tAAKLCYLLLRVOS GF 

ACTGCGCGCAGCAAGTGATGCCCTTAACCGACCCCATCACGTTCCCCTCCGCGAAGCGTTATTAGGCCCAGTTATTGATC 

SB41 - -* " ♦ S9:0 

TGACGCCCGTCGTTCACTACCGGAAtTCCCTCCGGTACTCCAAGGGCACCCGCTTCGCAATAATCCCGCTCAAT.\ACTAC 

0 L H A A S D G L K R X H Q V r V C r A L L G R V : 0 G - 

b CGCCVMALSD A IRrrWAKRY-CELLM 

<■ T A C S K • X r • A T r S G J R G R S V ; P A S V • W - 
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2^1/39 

CCTTTCGTCCTCCCCTTGATCCCCGCCAACTCCCCCACCTCTCCTCCAAACACTATGATCCAATCCCTCCTCCCr.CMT:; 

* ♦ ♦ *" 6000 

CCAAACCAGCACGCGAACTACCCGCCCTTGACCCGCTGCAGACCACCTTTCTGATACTACCTTACGCAGGAGGGCCTTAr 



6001 



rcR PLOCRELPOVCWKOYDAMPPPAm 

AI. v v plmaanc ptsagktmmocllpqw 

LWS3P»wpRTARRLL£RL* CNASSRnc 
CTTCGACAGCCTATCACTCAACCATTAATCACGGGGATTCGCCCTATTGATAGCGrrCCCACCTCTGCCGAACCCCAACG 
CAAGCTCTCGGATAGTGAGTTCGTAATTACTCCCCCTAAGCCCCATAACTATCGCAACCCTCCACACCCCTTCCCGTTG': 



6080 



6081 



160 



VROPI TQPLMTGI RAI D SVATCGEC9R- 
FD3LSLNH* • RCrAtLIALRPVAKGME- 
STA rHSTlNOGDSRY-'RCOLWRR AT 

AGTGGGTATTTTTTCTGCTCCTGGCGTGGGGAAAAGCACGCTTCTGGCCATCCTCTGTAATGCGCCAGACGCAGACAGCA 

TCACCCATAAWAGACGAGGACCGCACCCCTTTTCGTCCGAAGACCGCTACCACACATTACGCGCTCTCCCTCTGTCCT 

VG: rSAPGVGKSTLLAMLCNAPDACSi:- 
WvrrLLLAWGKARFWftCCVMROTOTA 
SGVrrCSWRGtKHASGDAV* CARRR.JO- 

ATCTTCTGGTGTTAATTGGTCA^CGTGGACCAG.^AGTCCGCGAATTCATCGATTTTACACTGTCTGAAGAGACCCG.-.V.:^ 

* - : <o 

TACA^GACCAC>^TTA;VCCACTTCCACCTCCTCTTCAGCCGCrrTAACTAGCTAAAATGTGACAGACTTCTCTCG;— TT 

VL v• LIGERGRt^/RtrrorTLSt:tT3r: 
MTWC - LVMVDEKSANSSILHCLKRPCS- 
CSGVNW* TWTRSPRIHRrrTV* ROP^:T- 



6241 



CGTTGTGTrATTGTTGTCCCAACCTCTGACAGACCCGCCTTAGAGCGCGTGAGGGCGCTCTTTGTGCCCACCACGATACC 

GCAACACAGT.^ACAACAGCGTTGGAGACTGTCTGGCCGGAATCTCGCCCACTCCCGCGACAAACACCGGTGCTGCTATCG 

RC'.' i VVATSDR PALtRVRALFVATTlA 
VVSLLSQPLTOPP» SA'GRCLWppR- 
LCHCCRNL' O TRLRAREGAVCGHHDS 



d3:0 



AGAATTTTTTCGCGATAATGGAAAGCGAGTCCTCTTGCTTGCCGACTCACTCACGCGTTATGCCA6GGCCGCACGG;kAAT 
TCTTAAA^iAiGCCCTATTACCTTTCCCTCAGCAGAACGAACGGCTGACTGACTCCCCAATACGCTCCCCGCGTGCCTTTA 



E r r .= :^NCKRVVLLADSLTRYARAAR^•5. 
^'ff-. IMCSESSCLPTH* RvmFGPKG n 
RI rSS - 'WKASHLACRLTDALCCGRTt! 

CGCTCTCCCGCCGaAGAGACCGCCGTTTCTGGAGAATATCCCCAGCCGTATTTAGTCCATTGCCACCACTTTTAGr.^CGT 
6<01 *„ — 

GCGAGACCCCGGCCTC7CTGGCGCCAAAGACCTC7TATAGCGGTCCGCATAAATCACGTAACGCTCCTGAAAATC7TCC- 

I.KF = t;DRGFWRI SPGvrSALPRLLL .^ 
RSGA GETAVSGEYROAYLVHCHOr - ^. v - 
ALATERPRFLENIARRI' CIATTFR7>. 

ACCCGAATGGOAOAAA.^AGGCAGTATTACCCCATTTTATACCGTACTGCTCGAAGGCGATCATATCAATGAAGCCCTTCG 
6481 - - , ^5 

TGCCCTTACCCTCTTTTTCCGTCATAATGGCGTAWTATGCCATCACCACCTTCCCCTACTATACTTACTTCCCC5ACC 

yscN* 

TGHGi.-. GS I TArYTVLVEGDOMNEAVC • 
REWEKKAVLPnriRYWWKAMI'M K » ' fl - 
CNGRKROYYRILYCTGCRR'VE'SRW 

CGCATCAAGTCCGTTCACTGCTTGATGGACATATTCTACTATCCCCACGGCTTGCAGAGAGCGCCCATTATCCTCCCATT 
6561 ♦ ♦ , 

CCCTACTTCAGGCAACTCACGAACTACCTGTATAACATGATAGCCCTCCCCAACCTCTCTCCCCCGTAATACGACGCTAA 

G' SrrTa.'WTYCTI PTACREG ALSCH-- 
OEVPSLLOGHIVLSRRLAERGHYPAI 
R M K S V w J I. H 0 I L Y y P D C L Q R G ; ! L F L • 
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GACGTGTTCCCAACGCTCAGCCGCCTTTTTCCACiTCCTTACCAGCCATGAGCATCCTCAACTGCCCCCCATATTC-CAC- 
CTCCACAACCGTTGCGACTCGGCCCAAAAAGCTCAGCAATGCTCGGTACTCCTAGCAGTTCACCCCCGCTAT.tACGCTCC 



J RVGNAOPRrSSRYOP'ASSTGGDl AT 

t) DVLATLSRVrpVVTSHEHROl. AAlLRt. 

C TCWORSAArrOSLPAMSIVNWRRtCD G-"' 

CTCCCTCGCGCTTTACCAGG ACCTTG AACTCT T AATACCCAT TCGGC AATACCAGCCACGAGT TGATACACAT AC TCAC A 

^'^^ * ♦ ♦ ♦ * ♦ 6S00 

CACGGACCGCGAAATGCTCCTCCAACTTGACAATTATGCGTAACCCCTTATGGTCGC7CCTCAACTATGTCTATGACTGT 

» VPGALPGG* TVrJTHWGIPARS* YR ?-*5- 

b CLALYOEVELLIRICEYORGVOTOTDK- 
c AWRrTRRLNC*YALGMTSEEL10!LT 

AACCCATTGATACCTATCCGGATATTTGC^CATTTTTGCGACAA^CTAAGCATGAAGTATCCCGACCCGAGCTACTTAT.> 

6801 • ♦ ♦ ♦ ♦ ♦ ♦ 6390 

TTCGGTAACTATGGATAGCCCTATAAACGTCTAAAAACGCTGTTTCATT(XTACTTCATACCCCTCCCCTCCA7GA\TAT 

3 SM*YLSCYLHirATK»G»SMKTRATY ?- 

0 AIDTYPOICTrUROSKOEVCGPELL: 

C KPLIPIRirAHFCDKVBMKYAOPSYL*- 

GAAAAATTACACCAAATACTCACCGAGTGATCATGGAAACTTTCCTGCAGATAATCCCCCCCCTCAAAAGCAATTACGC: 

€881 ♦ 4^6: 

CTTTTTAATGTGGTTTATCAGTGGCTCACTAGTACCTTTGAAACGACCTCTATTAGCGCCCCGACTTTTCCTTAATGCC: 

end yscN* yscO* 

3 KITPNTHRVIHETLLEIIARLKSN: A 

D E K L H 0 ? !- T g * SWKLCWR* SRC" KA!TR- 

c KNYTK YSPS D H N F ACONRAAtKOLR S- 

GCAAGCTTACCGTACTTGATCAGCAGCAACAGGCGATTATTACGGAACACCAGATTTGCCAGACGCCCGCTTTAGCACTC 

6961 * - — ♦ 7040 

CGTTCGAATGGCATGAACTACTCGTCGTTGTCCGCTAATAATGCCTTCTCGTCTAAACGGTCTCCGCGCGAAt.TCGTCAC 

^ ASLPYLI S SNRRLLRNSRFARRAL' QC- 

t> OAYRT'SAATGOYYGTADLPDARrSS V- 

- KLTVLDOOQOAIITEOOICOTRALAV 

TCTACCAGACTGAAAGAATTAATGGGCTGGCAAGGTACGTTATCTTGTCATTTATTGTTGGATAACAAACAACAA^TGGC 
'041 - - ♦ ♦ -- 

AGATGGTCTGACTTTCTTAATTACCCGACCGTTCCATGC.^TAGAACAGTAAATAACAACCTATTCTTTGTTGTTTACCC- 

LPD' KN - WAGKVRVLV{YCHIkNNKV/r- 
^ YOTERlNGLARYVlLSriVr, ►ET7NG 

STRLKELMGWOCTLSCHLLLDKKOQM A- 

CGGCTTATTCACTCAGGCGCAGAGCTTTTTGACCCAACGGCAACCAGTTAGACAATCAGTATCAGCAGCTTGTCTCCCC: 

I'l - 

CCCCAATAAGTGAGTCCGCGTCTCGAAAAACTGCGTTGCCGTTCGTCAATCTCTTAGTCATAGTCCTCGAACAGA3GGCC 

* CYSLRRRAr'RNGKOLENOYOQLVSR 

t> RVIHSGAELFOATASS-RISISSLSPC- 
C GLFTOAOSrLTOROAVRESVSAACLPi- 

CGAACCGAATTACAGAACMTmAATCCGCTTATGA^^AAGAAAGAAAAAATTACTATGGTATTAACCGATGCGTATT^ 

'201 ♦ ♦ - T^er 

CCTTCGCTTAATGTCTTCTTAAAATTACGCGAATACTTTTTCTTTCTTTTTTAATGATACCATA^TTCGCTACGCATAAT 

end yscO* scare yscp- 

* RSCLOKNrNALMKKKEKlTMVLSOAY'k- 
t> EANYRRILMRL'KRKKKLLWY'A M P i ' - 
C K R ! T r E r * CAYEKERKNYYClKRCVL 

CCAAAGTTCAGGGAAGTCTTCGGTTGCCATGCCACTCTTATCAGCATCATAACGAGGCCGAGCCGGAACCTATCOACTTT 

^?B1 3o- 

CGTTTCAACTCCCrrCAGA.ACCCAACGCTACGGTCAGAATACTCCTACTAtTGCTCCCCCTCCGCCTTCCATACCTGAAA 

a 0£*GKSHVAMHVLSG''HCGGGTyGL*- 

b KVEG5LGLPCC5VQDDN£AEAERMDr 

c pKLPEVLGCHASLlHMlTRRkKNVWTL- 
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CAACAACTCATGCACCAGCCATTACCCATTGCTCACAATAATCCTCCTGCAGCATTGAATAACAACGTCUTTTTCACCCA 

H6I ♦ » ♦ * J4|f, 

CrrTGTTCAGTACGTGCTCCCTAATGGCTAACCACTCTTATTAGGACGACCTCCTAACTTATTCTTCCACCAAAACTCCCT 

TTH tpG;THW»£* SSCSIE'tHCFHA 
EOLMHOALPIGENNPPAALNKNVvrTO- 
NNSCTRMYPLVRI ILLOH" IRTWfSRN- 

ACGTTATCGTGTTAGTGCCGCTTATCTTGACGGTCTACACTGTGAAGTATCTCAATCACGGGGGCTAATCCAGTTAAGAA 

UU - * ♦ ♦ * ^b20 

TGCAATAGCACAATCACCGCCAATAGAACTGCCACATCTCACACTTCATACACTTAGTCCCCCCGATTACGTCAATTCTT 

TLSC'HRLS* RCRV* SM^ IRGANPVKri- 
RYRVSCCYLDGVECEVCESGGLIOLR I- 
VIVLVAVl LTV* SVKYVNOGG* SS' Z 

7CAATCTCCCTCATCATGAAATTTACCGTTCGATGAAAGCCCTAAAGCAGTGGCTGGAGTCTCAGTTGCTGCATATGGGC 

1521 * • ♦ * • * * * ''^OO 

ACTTACAGCCAGTAGTACTTTAAATGCCAAGCTACTTTCGCGATTTCCTCACCGACCTCAGAGTCAACGACGTATACCCC 

QCPSS* NLPFDESAKAVACVSVAAYG V- 
NVPHHEIYRSMKALKOWLCSOLLHJ^G 
SMSLIMKFTVR' KB' S^GWSLSCCIV.'G- 

TATATAATTrcrCTGGACATATTCTATGTTAAGAATAGCGAATGAACACCGTCCGTGGGTCGAGATACTTCCAACGCA/.G 

?60l • . , • ♦ ♦ fttdO 

ATATATTAA.AGGGACCTCTATAAGATACAATTCTTATCGCTTACTTCTCGCAGGCACCCACCTCTATGAAGGTTGCGTTC 

end yscP* start yscQ*? 

YMrFGDlLC'E'RMKSV'RGWRYrCRK 
Y I I 3 L E I F V V K fi <: g' ' RASVGGOTSNi.R - 
I • r P W R Y S M « R ! A M EERPWVEILPTOC- 

GCGCTACCATTGGTGAGCTGACATTCAGTATGCAACAATATCCAGTACACCAAGGGACATTATTTACCATAAATTATCAT 

7681 - -* ♦ ♦ ♦ 7^60 

CGCCATGGTAACCACTCGACTGTAACTCATACGTTGTTATAGGTCATGTCGTTCCCTGTAATAAATGGTATTT.aATAGTA 

Start yscQ*? 

ALFL VS'H* VCNNlOYSKGHYtP* I 11- 
RYMW*ADIEYATISST .^ RDI lYHKLS^- 
ATiGELTLS H O f> Y P VQOGTLFTIH rri 

AATGAGCTGGGTAGCGTGTGGATTGCACAACAATCCTGGCAGCGCTGGTGTGAAGGGCTAATTGGCACCGCTAA.TCGATC 

7161 • ♦ - * -8^0 

TTftCTCGACCCATCCCACACCTAACGTCTTCTTACGACCGTCGCGACCACACTTCCCGATTAACCGTGCCCATTiCCTAS 

M S w G C G I. 0 N M A G S A G V K G • L A P L I D « • 

AG - GVDCRTMLAALV- RANWHR' 
NELGRVWIAEOCWQRWCtGL IGTAN^.- - 

GGCTATCGATCCTCAATTGCTATATGGAATACCTGAATGCGGCCTCGCGCCCTTATTGCAAGCCAGTGATGCAACCCTCT 

7BM * * ♦ 

CCGATAGCTAGGACTTAACCATATACCTTATCCACTTACCCCCCACCGCGGCAATAACGTTCGGTCACTACCTTGGGAGi 

IS I LNCYME* LHGGWRSYCKPVMO?? 
GYRS' lAlWNS* MGAGAV lASO' CNPL • 
AI orELLYCIAEWGLAPLLOASDATLT- 

GTCAGAACGAGCCGCCAACATCCTGCAGTAATCTACCACATCACCTAGCCTTGCATATTAAATCGACAGTTCAACAGCAT 

7 921 ♦ — • * * 

CAGTCTTCCTCGCCGGTTGTACGACGTCATTAGATGGTGTAGTCGATCGCAACCTATAATTTACCTGTCAACTTCTCGTA 

VRTS'^OHPAVIYHlS* RCI LNGOtKSM- 
SEhAANlL0*STTSASVAY-MOS - R \ - - 
ONEPPTSCSNLPHOLALMIKWTVEEM 

GAGTTCCATAGCATTATTTTTACATGGCCAACCCCTTTTTTGCGCAATATAGTCGGAGAGCTTTCTGCTGAGCGACAACA 

8001 - * 

CTCAAGGTATCCTAATAAAAATGTACCGGTTGCCCAAAAAACGCGTTATATCACCCTCTCGAAAGACGACTCCCTGTTCT 

SS I ALrLHGOP^'fCAI' StSFLLSDNk- 
V p • V V F Y M A N G r F A 0 t S R R A F C • A T " 
t F U :\ : r T W P T G F L R N I V 0 C L S A E R 0 * V 
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GATTTATCCTGCCCCTCCTCTCCTACTCCCTCTATATTCAGCCTCXTCXrCACCTTACATTAATCGAACTTCACTCTATCr. 

»08l 8l€C 

CTAAATACCACCCGCAGCACACCATCAGGCACATATAAGTCCGACCACGCTCCAATGTAATTAGCTTGAACTCACATACC 

ri LPLLW - JL YIOAGASLH* SNLSt. S 
nLSCPSCGSPCirRLVPAYlNRT' V YR- 
I t PAPPVVVPVYSGWCOLTLI ELCSI C- 

AAATCCGCATGCGCGTTCGGATTCATTCCTTCGCCGACATCAGACTCGCTTTTTTTCCTATTCAACTACCTGGGCGAATC 

♦ ^ » 8240 

TTTAGCCGTACCCGCAAGCCTAAGTAACGAACCCGCTGTAGTCTGAGCCAAAAAAACGATAAGTTGATCCACCCCCTTAG 

KSAHArGriASATSOSVrLLFNYLGES- 
MRMGRSDSLLRRHOTRFFCYSTTWGN L- 
IGMGVRIHCrODlRLGrrAIOtPCGl 

TACCCAAGCGTGTTGCTGACAGAGGATAACACGATCAAATTTGACGAATTAGTCCAGGATATCGAAACGCTACTTGCCTC 

* * 8320 

ATCCCTTCCCACAACGACTGTCTCCTATTGTCCTACTTTAAACTGCTTAATCAGCTCCTATAGCTTTCCCATCAACGCAG 

TOGCC* ORITR - NLTN* SRISKRYLRQ- 
aKCVADRG'HDCI'RISPCYRNAT CV 
YARVLLTEDNTMKrOELVOOI ETLcAS- 

AGGGAGCCCA.ATGTCAAAGAGTGACCGAACGTCTrCAGTCGAA.CTTGAGCACATACCACAACAGGTGCTCTTTGA.CGTCG 

♦ 

TCCCTCGGGTTACAGTTTCTCACTGCCTTGCAGAAGTCACCTTGAACTCGTCTATGGTGTTGTCCACGAGAAACTCCAGC 

3A0C0RVTERLCSrJLSRYHNRCSLRS 
REPNVKE* RNVrSRT" AOTTTGAL* GR- 
G.S PMSKSDGTSSVELEOI P(?OVLf£ VG- 

CACGTGCCAGTCTGGAAATTGGACAATTACGAC.iw^CTTAAAACGGGGGACGTTTTGCCTGTAGGTGGATGTTTTGCGCCA 

fl^Ol 9480 

CTGC AC GCTCAGACCTTT AACCTGTTA.ATGCTGTT GAATTTTGCCCCCTGC AAAACGGACATC CACCT ACAAAACGCCGT 

DVRVWKLDNYDr:LKRGTrCL' VDVLRQ- 
TCESGNWTITTT' NGGRFACRWMFCAR- 
RASLEIGOLRQLKTGDVLPVGGCFAP 

GACCTGACGATAAGAGTAAATGACCGTATTATTGGGCAACCTGACTTGATTCCCTGTGGCAATCAATTTATGGTGCCTAT 

* * ebec 

CTCCACTCCTATTCTCATTTACTGGCATA.ATA.ACCCGTTCCACTCAACTAACGCACACCCTTACTTAAATACCACGCATA 

R* E-MTVLLGKVS' LPVAMNLKCVL- 
GODKSK - PYVWAR'VOCLWO' lYGAY 
tvT IRVNORl : 3>?GELIACGMEfn VRt - 

TACACGTTGGTATCTTTGCAAu^AATACACCGT-V.ACCTGATAAGAAAAATAATATGCG.tACAATATAATACCCTTCC^^ 

ATGTGCA^CCATACAAACCTTTTTATGTCGCATTTGGACTATTCTTTTTATTATACGCTTGTTATATTATCGCAAGGTCC 

end yscO* 

HVGl FAKIQRKPDKKNNMRT!' • RSR 
Y T L V S L 0 K Y S V K L r R K 1 I C E v Y N S V ? G - 
T R W Y L C K N T A ■ T* ' EK* YANN I lAFOV- 

TCGTGTCATGAGAGATACACTATCTCTTTACCCCATTCCCCTTTCaVACTGATTGGTATATTGTTTCTGCTTTC^ 

864 \ * — 4 ♦ . — 8'':r 

AGCACAGTACTCTCTATGTCATACACAAATCCGCTAAGCCGAAACGTTGACTAACCATAT.AACA.AAGACGAA.ACTTATCA 

Start y5c/?'? 

S C M r P Y S M y L P P FPLQLCGI LfLLS: L- 
RVMRDTVCLYPIRLCN* LVYCrCfOYC- 
VS* EIQYVFTRrAFATDWY IVSAFNT 

GCCTCTCATTATCGTCATGGCAACTT(rrTTCCTT.^AACTCGCGCTGCTATTTTCGATTTTACGAA.ATGCTCTGGGTATTC 

8721 - - B300 

CGCACAGTAATAGCAGTACCCTTGAAGAAAGGAATTTGACCCCCACCATAAAAGCTAAAATGCTTTACGAGACCCATAAG 

PL I V K G T S r L K A V V F S 1 I. R N A L G 1 Q - 

L 5 L S S W t L L 5 L W R W Y F R F 1 F. M L K V F 

ASHYRMCNrr?* ~CGGi FnrTKfscts- 
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AAC\^oT ^XCC 'ICAAATATCGCACTGTATCGCCTTCCCCTTCTACTTTCCT TATTCATT ATGGCGCCCACCCTAT TACC^ 

8801 8880 

TTCTTCAGGGCCCTTTATACCCTGACATACCGGAACGCCAACATGAAACGAATAAGTAATACCCCCCCTCCCATAATCCA 

OvPPrilALYCtALVLSLFlMGPTLLA 
NKSPOISHCMALRLYFPYSLWGRRr' t- 
TSPkKYRTVHPCACTrLI HYGADAl SC- 

GTAAAAGAGCGCrCCCATCCGCTTCAGCTCGCTGGCCCTCCTTTCTGGACGTCTGACTGGGACAGTAAAGCATTACCCCC 

8B81 • ♦ ♦ ♦ ♦ ♦ ♦ B960 

CATTTTCTCGCCACCGTAGCCCAAGTCCACCCACCCCGAGGAAACACCTCCAGACTCACCCTGTCATTTCGTAATCGCGG 

VKtRKHPVOVACAPFWTSEHDSKALAP- 

• KSA OIRFRSLALLSCRLSGTVKH* RL- 
KRAl. ASGSCRHRSrLDV»VGO*SlSA 

T TAT CCACACT TTTTGCAAAAAAACTCTGAAGAGAACGAACCCAATT ATTTTCGGAATTTGAT AAAACCAACCT 

8961 ♦ ♦ ♦ ♦ ♦ ♦ ♦ 9040 

AATAGCTCTC.Vva.^ACCTTTTTTTGAGACTTCTCTTCCTTCGGTTAATAAAACCCTTAAACTATTTTCCTTGGACCCCAC 

yROrLOKNSEEKEANTFRNL I KRTWp E- 
IDSrCKKTLKRRKPlIFCI'-NtPGl, 
LSTV-AKKL* REGSOLFSEFOKTNLt* 

A.'WJACATAA.ii^GAAAGATAAAACCTGATTCrrTGCTCATATTAATTCCGGCATTTACGCTGAGTCAGTTAACGCf^ 

9041 ♦ - ♦ ♦ — ?: 

ttctgtattttt:tttctattttgcactaagaaaccactataattaagcccctaaatcccactcactcaattccctccgt 

oikp. .kikpdsllilipaftvsoltoa 

KT' KZH'tlLllCSt* FRHLR* VS* RRH- 
RMKKKOKT' FFAHlNSGlYGESVNACl- 

tttcgcattgcattacttatttatcttccctttctcgctattcacctccttatttcaaatatactgctggctatggggat 

9121 ♦-- ♦ ♦ 9?00 

AAAGCCTAACCT.-ATGAATAAATAGAAGGGAAAa=iCCCATAACTGGACCAATAAAGTTTATATCACGACCGATACCCCTA 

FRIGI^LIYLPFLAI DLLI SMI LLAMGM- 
FGLCrtri FPFWLLTCLFOIYCWLWG'- 
SDWITYLSSLSGY* PAtFKYTAGYCD 

GATGATGCTGTCC CCCATCACCATTTCATT ACCCT TTAACCTCCTAATATTTTTACTGGCAGGCCGTTGCCAT CTGAC AC 

9701 . ♦ ♦ ♦ ♦ ♦ 9230 

CTACTACCACAGCCGCTACTGGTAAAGTAATGGCAAATTCGACGATTATAAAAATGACCGTCCCCCAACCCTAGACTCTG 

MMVS?MT I SLPFKLL t FLLAGCWOLT L- 

* WCKR* PFHYRLSC* YFYW QAVCI* H 

ddgvaohh r I tv- aan i ftg r r lgs :t - 

TGGCGC A.AT TG GTACAGAGCTTTTCATG AATGATTCTGAATT GACGCAATTTGTAACGCAACTTTTATGGATCGTCCT TT 

9281 * ?3c.O 

ACCGCGTTAACCATCTCTCGAAAAGTACTTACTAACACTTAACTGCGTTAAACATTGCCTTGA.^AATACCTAGCAGGAAA 

end yscft* start yscS* 

ACL V O S r g * MILN'RNL*RNFYGSSF 

wrnwyrafhc* f* ioaicnatfmorpf- 

G A I G T E L F M M D g E L TOFVTQLLWIVLF- 

TTACGTCTATCCCGGTAGTGTTCGTGGCATCGGTAGTTCGTCTCATCGTAAGCCTTCTTCACGCCTTGACTCAAATACAG 

9361 ♦ ♦ ♦ • ♦ 9440 

AATGCACATACCCCCATCACAACCACCGTACCCATCAACCACAGTACCATTCGGAACAAGTCCGCAACTGAGTTTATCTC 

LRLC?* CWWHR' .UVSS' ALFRP* LKVR • 
YVYAGSVGC I C SWCHRKPCSGLDSNTG- 

tsmpvvlvasvvgvivslvoaltoio 
gaccaaacgctacacttcatcattaaattattggcaattgcaataaccttaatggtcagctacccatggcttaccgctat 

9441 * *- — ♦ ♦ 9520 

ctcctttgcgatctcaagtactaatttaataaccgttaacgttattggaattaccagtcgatgcgtaccgaatcgccata 

TKRYSS' LNYWOLO* P'WSATHGLAVS- 
PNATVHC.'IIGNCNNLNGOLPMA'Ry 
OQTLOFM lKLLAI A.rTLMVSYPWLSGI - 
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cctcttg;>-^ttataccccccacataatcttacca.^ttggacagcatggttgaatcccacaacaggtaaatgactcgctta 

0^71 ♦ ♦ ♦ — ♦ 960i* 

GGACAACTTPATATGCGCCGTCTATTACAATGCTTAACCTCTCCTACCAACTTACCGTCTTCTCCATTTACTCACCCAAT 

end yscS* scare yscT- 

C'lIPGft'CYELESMVEWHNR-MSCL 
PVEL YPAONVTNWRAWLHGTTCK' VAY- 
LLNYTRQIMLR i r. r » C, - f1 A P P — )L — C w L 1 - 

TTGCATTGCCTGTCGCTTTTATTCGACCATTCAGCCTTTCrTTATTACTTCCCTTATTAAAAACTCGCAGmAGCW 

^eOl ♦ ♦ ♦ ♦ , 9680 

AACGTAACCa^CACCCAAAATAAGCTGGTAACTCGGAAACAAATAATCAAGCGAATAATTTTTCACCCTCAAATCCCCCG 

LHWLWULrDH - ATLYYTPY* KVAV'GP- 
CIGCGTYST lEPrFITSLIKKWOFRCR- 
ALAVAFIRPLSLSLLLPLLKSCSLCA 

GCACTTTTACCT.AATGGCGTGCTTATGTCACTTACCTTTCCGATATTACCAATCATTTACCAGCAGAAGATTATCATCCA 

96tJ» - ♦ 

CCTGAAAATGCATTACCGCACGAATACAGTGAATGCAAAGGCTATAATCCTTAGTAAATCCTCGTCTTC7AATACTACCT 

HrYVMACLCHLPFRYtQSFTSRRL' CI- 
TFT'WRAYVTYLSOITNHLPAEOYOA 
A*;.LRNr, VLMSLTFPltPIIYOOKlMMH- 

Tn insertion P9B'' 
U 

TATTGGTAAAGATTACACTTGGTTACGGTTAGTCACTGGAGACCTCATTATTGGTTTTTCAATTGCCTTTTGTCCGGCGG 

9761 ♦ • » ♦ • » 9B40 

ATAACCATTTCTAATGTC.^ACCAATCCCAATCAGTGACCTCTCCACTAATAACCAAAAAGTTAACCCAAAACACCCCGCC 

LVKITVG-C'SLER-ULVFOLGFVRR 
YW - RLOLVRVSHWRGDYWFFNWVLCGG- 
I CKOVSKLG LVTGEV t IGFS I .G FCAAV- 

TTCCCTTTTGGGCCGTTGATATGGCGGGCTTTCTGCTTGATACTTTACGTGGCCCGACAATGGGTACCATATTCAATTCT 

9841 - — ♦ • ♦ ♦ ♦ 9920 

AAGCGAAAACCCGGCAACTATACCGCCCCAAACACGAACTATCAAATCCACCGCCCTCTTACCCATCCTATAAGTTAACA 

FPFCPLIWRGFCLl LYVAROHVRYSIL- 
SLLCR' YCGVSA* YFTWRONGYDIQFY- 
PFW AVOMACFLLDTLRCATMGTIFNS 

ACAATAGAAGCTCAAACCTCACTTTTTGGCTTCCTTTTCAGCCAGTTCTTGTGTGTTATTTTCTTTATAAGCGGCGGCAT 

9921 • ♦ ♦ * ♦ ♦ lOOOC 

TGTTATCTTCGACTTTGGAGTGAAAAACCGAACGAAAAGTCCGTCAAGAACACACAATAAAAGAAATATTCCCCGCCGTA 

0» KLKHHFUACFSASSCVLFSL - AAAW- 
NRC.NLTFWLAFQPVLVCYFL^KRRH 
TICAlTSLFGLLFSOFLCVI FFI SGGM- 

GGAGrrTATATTAAACATTCTCTATGAGTCATATCAATATTTACCACCACGCCCTACTTTATTATTTGACCACCAAT^ 

1 0001 • ♦ ♦ • • ♦ • 1 0080 

CCTCAAATATAATTTGTAAGACATACTCACTATACTTATAAATGCTCCTCCCCCATGAAATAATAAACTGGTCGTTAAAA 

SLY-TFCMSHlNllMOCVtYYLTSNr 
CVYIKHSV'VISIFTTRAYFII* PAIF- 
EFILNILYESYOYLPPGRTLLFDOOFL- 

TAAAATATATCCAGGCAGACTCGAGAACCCTTTATCAATTATGTATCACCTTCTCTCTTCCTCCCATAATATGTATGGTA 

lOOBl • ♦ ♦ * • * ■ • • 1016C 

ATTTTATATACGTCCGTCTCACCTCTTCCGAAATACTTAATACATAGTCGAAGAGAGAAGGACGGTATTATACATACCAT 

•NISROSCERFINYVSASLFLP - YVWY- 
KlYPGRVENALSIMtQLLSSCMNKYGI- 
KYlOAEWRTLYOtClSFSLPAIICMV 

TTAGCCGATCTGGCTTTAGGTCTTTTAAATCCGTCGGCACAACAArrGAATGTCTTTTTCTTCTCAATCCCGCTCAAAAC 

10161 • — - ♦ I 024 : 

AATCGCCTACACCCAAATCCAGAAAATTTACCCAGCCGTCTTCTTAACTTACACAAAAAGAACACTTACGGCGACTTTTC 

( I WI*\'F - ;GBMNN*MCFS5JC CRSKV. 
3 k ; ; - r R S r K S V G T T 1 F. r V F L L N A A 0 * K 
LADi. A lGLLnFSAOQLNVFFFSMPLSS- 
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TATATTCCTTCTACTGACCYCCTCATCTCArrCCCTTATCCTCTTCATCACTATTTCCTTGAA.^CCCATA^J\TTTTATAT 

i07t\ • ♦ - 10320 

ATATAACCAAGATGACTCCRCGACTAGAGTAACGGAATACGAGAACTACTGATAAACCAACTTTCCCTATTTAAAATATA 

YWTY - RPOLt PLCSSSLFC* KR^ ILY 
YiGSTU? LI Gr?YALHHYLVtSOKr vi- 
I LVLLT?' SMSLMLriTIWLKAINFrr- 

TTATCTAAAAGACTGCTTTCCATCTCTATCACCGAGAAAACAGAACAGCCTACAGAAAAGAAATTACCTCATGCCCGTAA 

10321 ♦ 10400 

AATACATTTTCTGACCAAAGGTACACATACTCGCTCTTTTGTCTTCTCGGATGTCTTTTCTTTAATGCACTACCGCCATT 

end yscT* stare yscU* 

LSKRLVSIC H S r K T g OPTEKKLROCRK- 
Y L K D W r P S V ' ARKONSLOKRH Y VMA VR- 
! • KTGrMLYERENRTAYRKC I 7 * wp- 

GGAAGCCC^CCTTGTCAAAAGTATTGAAATAACATCATTATTTCACCTCATTGCGCTTTATTTCTATTTTCATTTCTTTA 

10401 - 10480 

CCTTCCCCTCCAACACTTTTCATA.'^CTTTATTCTAGTAATAAAGTCCACTAACGCGAAATAAACATAAAAGT^^AGAAAT 

EG0VVK3 I E ITStrOLIALY LVrMrFT- 
KGRLSKVLK* HHYTS* LRTICI F;SL 
CRACCQKY - NKTI I SAOCALF vrfrrLY- 

CTCAAAACATGATTTTGATACTCATTCAGTCAATAACTTTCACATTACAATTAGTAAATAAACCATTTTCrTATCCATTA 

10481 --♦ ♦ * 10560 

GACTTTTCTACTAAAACTATCAC7AACTCACTTATTGAAAGTGTAATGTTAATCATTTATTTCCTAAAAGA;;rACGTAAT 

EKMILILIE3ITFTL0LVNKPFSY A L 
LKR - F*Y* ISO* LSHYN - ' lNHFLM-i« - 
KDDFDTD' VNWFHITISK* TJ F'.ClN- 

ACGCAATTGAGTCATGCTTTAATAGAGTCACTGACTTCTGCACTGCTGTTTCTGGCCGCTCGCGTAATACTTGCTACTGT 

10561 ♦ • ♦ • ♦ 10640 

TGCGTTAACTCAGTACGAAATTATCTCAGTGACTGAAGACCTGACGACAAAGACCCGCGACCCCATTATC;'ACGATGACA 

70LSHAL1ESLTSALLFLGAGVIVA1V- 
RN* VML* • SH* LLHCCFWALG - • LuLW- 
AlESCrNRVTDFCTAVSGRWGNSrrC 

CCCTAGCGTGTTTCTTCAGGTCGCGCTGGTTA7TGCCAGCAAGGCCATTGGTTTTAAAACCC.AGCATATA.-^TCCGGTAA 

10641 - ♦ 107?0 

CCCATCGCACAAACAACTCCACCCCCACCAATAacGGTCGTTCCGGTAACCAAAATTTTCCCTCCTATATT7AC-:.:CATT 

GSVFLOVGVVI ASKAIGFKSEHIMT 3- 
VACFFRWGWLLFARPLVLKASl* I ^ ' 
C' RVSSGGGGYCOOGHHF* KRAYK£ OK- 

G7A.^T7TTAAGCAGATATTCTCTTTACATAGCGTAGTAGAATTATG7AA^7CCACCCTAAAACTTATCATGC^^a7CTCTT 

10721 - - ♦ 10800 

CAT7AAAATTCGTCTATAAGAGAAA7GTATCCCATCATCTTAATACATTTACCTCCGATTTTC^TACTACGA7AGAGAA 

NFKOl FSLHSVVELCKSSLK VIML 5L 
VILSRYSLYIA*' NYVNPA* KLSC :-L- 
r* AOILFT' RSRIM' I Q P K S \ HAISV- 

ATCTTTCCCTTTTTCTTTTATTATTATCCCAG7ACTTTTCGGGCGCTACCGTACTGTCCCTTAGCC7GTGCCG7GCTTCT 

loeoi • ♦ ♦ — ♦ • loeec 

TAGAAACGGAAAAAGAAAATAATAATACCGTCATGAAAACCCCCCCATGGCATCACACCCAATCGGACACCGCACGAACA 

IFAFFFYYYASTFRALPYCGLACCwLV. 
SLPFSFl lMPVtFGRYR7VC' PVACLW- 
LCLFLLLLC0YFSGATVUWV3LHRAC 

GCTTTCTTCTTTAATAAAATGGTTATCCGTACCGC7CATGGTTTTTTATATCGTCGTTGGCATACTGCACTATTCTTTTC 

108B1 - ♦ ' ♦ ♦ ♦ 10960 

CCAAACAAGAAATTATTTTACCAATACCCA7CCCCAC7ACCAAAAAATATAGCAGCAACCGTATCACCTGATAA.CAAAAC 

VSSL IKWLWVC. MVFYIVVG I LOYSFQ- 
FLL''HGYC-0'WFFIS5LA>W7I--F 
GFFFMKMVMCR SDGFLYRHWH7GLFF .S - 
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AATATTATAACATTAGAAA/*GCTATCTAu\AA^TGAGTAAACATGACGTAAAACAGCAGCATAAAGATCTGGACCCC<^ 

10961 - - liO^O 

TTATAATATTCTA^TCTTTTCGATACATTTTTACTCATTTCTACTCCATTTTCTCCTCnTATTTCTACACCTCCCCCTGC 

YYKIRKAI'K'VKHT-NRStKIWRAT 
NI IRLf. KLSKNE' H* RKTGA* RSGCRP- 
IL* 0' KS YLKMSKDOVKOEMKOLCCOP- 

Tn insertion Pl2Fb 
11 

CTCAAATGAAGACGCGCCGTCGCAAATCCACAGTGAAATACAAAGTGCGAGTTTACCTCAATCTCTTAAACAATCTCTTG 

IIOM - — ♦ ♦ * ♦ 11120 

GACTTTACTTCTGCCCCCCACCCTTTACGTCTCACTTTATCTTTCACCCTCAAATCCACTTAGACAATTTGTTACACAAC 

LK* RRGV CNAC'NTKWErSSIC'TICC- 
SNT DAASCMOSEI QSCSLAOSVKOSVA- 
OMKTRRkKCRVKYKVCV* LNLUNNLU 

CCGTAGTCCGTAATCCA.^CGCATATTGCGGTTTCTCTTGGCTATCATCCCACCCATATGCCAATACCACCCCTCCTCCAA 

11121 ♦ ♦ 11200 

CCCATCACCCATTACCTTCCCTATAACGCCAAACACAACCGATAGTAGGGTGGCTATACGGTTATGGTGCGCAGGACCTT 

GSA - SNArCGLSWLSSHRrANTTRPCK- 
VVRNPTHIAVCLGYHPTDMPIPBVLE 
R' CVIQRI LR FVLAI 1 PPICOYHASWK- 

AAAGCCAGTGATGCTC-ACCTA'VCTATArrGTTAACATCGCTGAACCCA.^CTGCATCCCCGTTGTTGAAAATCTTGACCT 

ii:Ol * - U290 

TTTCCCTCACTACGACT7CGATTGATATAACAATTGTAGCCACTTGCGTTCACGTAGCGCCAAC.^ACTTTTACAACTCGA 

RO* CS S* LYC* HR - TOLHPRC* KC - A 
KGSD AQA?*Y ! VNlAERNCI PVVENVEL- 
KAVhtLKLi: LLTSLNATASPLLKMLSW- 

GCCCCCCTCATTATTTTTTC.^ACTGCAACCCGGAGATAAAATTCCTGAAACCTTATTTCAACCCCTTGCAGCCTTCTTAC 

112BI ♦ 11360 

CCGGGCGAGTAATAA^AAJ^CTTCACCTTCCCCCrCTATTTTAAGCACTTTGCAATAAACTTGCCCAACGTCGGAACAATG 

GPL! 1 r* SGTRR' NS* NVI' TRCSLVT- 
ARSLrrtvERGCKI PCTLFEPVAALtR- 
PAHyri K-WNAElKFLKRVLNPtOPCY 

GTATGGTGATGAAGATAGATTATGCGCATTCTACCGAAACACCATAAATCCTTTTGGTATGCTTCTTCACCCCACTGCCA 

1136J - M440 

CATACCACTACTTCTATCTAATACGCGTAAGATCGCTTTGTGGTATTTACGAAAACCATACGAACAAGTCCCCTGACGCT 

end yscU* 

YGDtDR LCAFYRNTINArr. MLLOATAK- 
M V M K I D Y A H S T !! T P ■ MLLVCFFRPLR 
V W • • P • ! M R I L P K H H K V- F W I A S S G J! C t - 

aggttaagagggtaataccgtat.m;agcagtgcttgaccataaaggtgacagactga.w\taatcgcttttagcctgcca 

♦ ♦ * ♦ liiro 

tccaattctcccattatcgcatatctcctcacgaactgctatttccactctctcacttttattaccgaaaatcggaccct 

vkrviayravlookcerlkiiafsla 
rlrg - • rieocltikvro - k» sllawh- 
g' ecnsv' ssa-r' r'etennrf - pgt- 

CAAGCACCAGATAGCGTATTAT.^AAATTAAACAAGATAATCGATTGGTGCGTCTGAATGGACTCGAACCAcTCGACCCCC 

U&21 - ♦ lloOC 

GTTCGTCGTCT AT CGCAT AATATTTTAATTTCTTCTATTACCTAACCACCC AGACT TACCTGAGCTTGCTqAGCTCCCGC 

OAPCSVL' r;* TR* WlCASEWTRTTRPP- 
KHOIAYYKI KODNCLVRLNCLCPLDPH- 
STR'RIIKLMKIMDWCV'MOSNMSTP 

ACCATGTWGGTCGTGCTCTA.'XCCAACTGAGCTATGAACGGCAACCTTGTAGGTGACAACGCGGACCAATATTAGCGTC 

1160J - - - 11660 

TCGTACACTTCCACCACCAGATTGGTTGACTCGATACTTCCCGTTCCAACATCCACTGTTGCCCCTGCTTATAATCGCAG 

P C C> G A L T • A M N G N V V (V I. N G D E Y * R M - 

H V K \- V L • r T C L • T A T L * V T T G T N I S V 
TMSRWCSNOLSYEROHCP* ORGRl LA5- 
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ACAACCGCAATCACCC^'AGAGGCAAATCCCAAmTCTTCCTCAAATCACCTCATTGCCCTCCAAATATCCAACATC^ 

11691 ineo 

TCTTCCCCTTACTCCGTTCTCCCTTTAGCCTTAAAAGAAGC^CTTTACTGCACTAACCCCACCTTTATACCrrCTACACC 

WRNEAHCKSOrSS'MHLIAVEICNMS 
TTAMHQCCNRNrLPEIT - L«WKtATCR- 
QPC' CKRr.lAl FFLKSPOCCGNMOHVE- 

AGAAAATACCCGCCATGCGACCCCTATCGTCGTATTATCCGACCCCCCTGCAAAATGATCGCGGACCCCTCACCTTCTAG 

11761 - ♦ UB40 

TCTTTTATCGCCGGTACGCTCCCCATAGCAGCATAATAGCCTCGCGCGACCTTTTACTACCGCCTCCCCACTCCAACATC 

RK« PPCOGTHR I IGARCKMMADG* RCR- 
r. NSRMATAIVVLSERAAK - HRTADVVO- 
KIA^MRRLSSTYRSALONDGGRLTL* 

ATACCGCATCCCTAGCATCATTAACACCGCCCCCGAGCTCAGGCCGATGATGAACCCCATCCAGAACCCTGCCGCTCCCA 

iXBi) ♦ ♦ ♦ n9?o 

TATCGCGTACCCATCGTAGTAATTCTGGCCCCGCCTCCAGTCCGCCTACTACTTCGGGTACCTCTTCCGACGCCCACGGT 

• RIRSI INTAACVRPMMNPIOKPAGPl- 
JAS. -. SLTPPPRSGR-'TPSRSLPVp 

: ahF - HH - HftRRGOAOOrPMPEACRSH- 
TACCATCCACCACCAAATCCGTTAACGCCAGGATATAACCGCTGGGTAAACCTAACACCCAGTAGGCGGTAA.^CCTGATA 

119?: - • — UOOO 

ATCCTACCTGCTGCTTTAGGCAATTGCGCTCCTATATTGGCGACCCATTTGGATTCTCCCTCATCCCCCATTTCCACTAT 

^STTK SVNARI* PLGKPNTO'AVK VI 
r DPPPMPLTPCYNRWVNLTPSRR' ft' 
Ti riMOIR' RODITAG' ?• HPVCCKCOK- 

A.aAAAGATGGA.ACGCGTATCTTTATA^CCCCGCACAATACCGCTCCCGATAACCTGTATAGAGTCGGAAATCTGGTAAAC 

I ?001 * ♦ * ♦ ♦ ♦ ♦ . 12080 

TTTTTCTACCTTCCGCATAGAAATATTGCCGCCTCTTATGGCGACGGCTATTGGACATATCrCAGCCTTTAGACCATTTC 

KKME.^VSL* PRRIPLPITCIESEIW-T- 
KRW.NAYLYNRAEYRCR* PV» SRKSGKP- 
KO CTR I riTAQNTAAOWLYRVGNLVN 

CGCAGCa-.GCAGCATTAATTCCGGCAACCCCCACGACCrrCAGCGTTCTCATTGTACACCAAAGCAATATGCTTACGCAGA 

12091 ♦ 12160 

CCGTCGCTCGTCGTAMTAACCCCCTTCGCGGTGCTGGAGTCCCAACAGTAACATCTCCTTTCCTTATACGAATGCGTCT 

AASS iNCGKRHOLRVVIVEQSNMLTOS- 
ORi .-. LIAASATTSGLSL'SKAICL?iR 
i»Sr. C- r' LRQA PRPOGTHCRAKgYATi AE 
G7AACCGTAAA,J^A.TAGCGGTAACCACAGCCATACAAATCCCGACGCCTAAACCGCTACCCGCTCCCTTTCC^ 

I2l6i ♦ i2?<0 

CATTGCCATTT7TATCCCCATTGCTCTCGGTATCTTTACCCCTGCGGATTTCGCCATGCGCGACCCAAACGCGTAGCTCC 

NGKNSGNHSHTMAOA*TCTRCVCASS 
VTVKJ AVTTAIOMPTPKPVRAAFAHPA- 
• a - K' R* POPYKCRRLNRYALRLRIOR- 

CTTCACCCCTGCCCCAGACCGATAACCCACTCGAATCGTTACCGCCCCACCCAGCGACATCCCCAGTACGAACATCACCC 

J2241 ♦ - ♦ ♦ ♦ * 12320 

CAACTCGGCACCGGGTCTGCCTATTCGCTGAGCTTAGCAATCCCCGCGTCGGTCCCTGTAGCCGTCATCCTTGTAGTCCC 

VEPWPRPI THSMRYRRSQRHROYEHQR - 
LSfGPDR' PTRIVTAAASOICSTNISE- 

• Al *0TDNPLCSLPP0PATSAVRT3A 
AGCTAAAGTTAAGCGCAATCTCATCACCGGCGACATCCACAATACCTAATCCCCAAACCAGCAGCCCAACGACCCCAAAT 

12321 ♦ ♦ ♦ 12400 

TCCATTTCAATTCGCGTTAGACTACTGCCCGCTCTAGCTGTTATCCATTACCGCTTTGCTCCTCGCCTTGCTCCCGTTTA 

AKVKRNLMTCOIHMT' HkHQQnt40RK'- 
LKLSAI'^PATSTIPNGtTSSATTAN 
S" S" - OSODRRMP -O YLMA KPAAORP OI 
AACGTCACTTC.wGAACAGCCAGCGCAATCGCCAACCCCAGTTGAATCACCCGCTTCATCACCACGCTATCGG^ 

12401 ♦ ♦ ♦ • 124B0 

TTGCAGTGA.ACTTTCTTGTCGGTCGCCTTAGCCGTTGCGGTCAACTTAGTCCGCGAACTACTGCTGCCATACCCCAAACG 

flllKK-LOPAOSAT I'VCSGAS* RRYRVC 
KVTS/NSORNROPOLNOALHDDAIGfA- 
TS LO HTASAIGHPS* IRRFMTTLSCLP- 
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CAAAGCCTTTTTCATTACGAATATCACCCATTGAACGCGCCTCTTTAATCTAAGAAACCATCGCCATAAACATCACCCAA 

\2*Ql - - \2b^0 

(;TTTCCOAAA,^AGTAATCCTTATAGTGCGTAACrrCCCCGCACAAATTACATTCTTTCCTACCGCTATTTGTAGTCGGTT 

QSLrHYCrHALNARV.* CKKAWR* TSPN- 
KAfriTNlTH* TRVrNVRKHGDKHMPl- 
KRrSLRISRICRACLM'tSMAINtTO 

TAGACCGCCCCACTCCCAACCCCGCAGCCCATACCCCCCAGTTCCCCCATACCAAAATCGCCATACATAAAAATATACTT 

17!)6I ♦ ♦ »?«<0 

ATCTGCCGGCCTCACCGTTCCGGCGTCCCCTATGGCCGCTCAAGGCCCTATCCTTTTACCGGTATCTATTTTTATATCAA 

RPPOSORRSRT RRVPAYOUGMR* KYSS- 
ORRSRNAAAOTAEFRHTKMAIDKNIV 
•T/^ AVATPOPIPPSSGIPKWP - iKT-r- 

CACCCGA.tTATTCACCAGCACGCCCAAAAATCCCATCACCATACCCGGTTTGGTrrTCCCCACACCrrCGCACTCCTTTC 

17641 > ♦ ♦ ♦ ♦ • l?^20 

GTGGCCTTATAACTGCTCCTCCGGGTTTTTACCGTAGTGGTATCGGCCAAACCAAAACCCCTCTGGAACCCTGACCAAAC 

PE tSPAGPKIPSPYPVWrwPOLRTCr 
HRH I HOOAOKSHHHTRrCrGOTFALVS- 
TG! rTSRPKMPITIPCLVLARPSMWrR- 
GCGCTACCTCA^AGAAAAGGTATCCTGCCCCCCACAGCACCCCGCGAAGATAACCCACGGCTTTATCGGCCAGCCCCGCA 

12721 - ♦ ♦ ♦ * ♦ i?800 

CCCGATGGACTTTCTTTTCCATAGGACCCGCGGTCTCGTCGCGCGCTTCTATTGCGTCCCCAAATAGCCGCTCGCCGCCT 

ALPCRKGl LRPTAARCDNPRLYRPAPD 
RT i.KEK VSCAPOgRAK ITHGr I GORR I- 
AT' KKRYPAPH5SARR* PTALSASAG 

TCAAT.-TTATGCATAGAGCGGATAATGTATCCCGCATTCCACAGGACGATCATCACCAGCACGGACACAAAGCCCGCCAC* 

12801 '-'980 

AGTTATAATACGTATCTCGCCTATTACATACGCCCTAAGGTCTCCTCCTACTACTCGTCCTGCCTCTGTTTCGGGCGGTC 

OYiA' SG* CI RHSTCRSSPARROSPPA- 
NIMHRADMVSGIPODDHHOHGDKARO 
SILCIERIMYPA rHRTI ITSTETKPAS- 

CCACAACCCTTGTCCAACCTGATGCGCGATACCCTCACCACGCCCGCAGCCATTCAGTTCCCCAATCACAGGCGTCA^OC 

12881 * ♦ »"60 

GCTCTTCGGAACACCTTGGACTACGCGCTATGCGAGTCCTGCCGCCCTCGGTAACTCAACGCGTTACTCTCCCCACTTCC 

RTLVEPDARYAHDGRSH* VAQSQAS- 
PEFLSNLMRDTLTTACAIELRNHRRCO- 
^NTCRT' CAIRSRRPEPLSCAITGVKA. 
CCAGCAGTAACCCGTGACCAAACAAAATGGCCGCA.»VCCAGATAGAGGTGCCCATACCGACCCCAGCCATCTCCCTAGCGC 

12961 ♦ ♦ - nojc 

GCTCGTCATTCGCCACTCGTTTCTTTTACCCCCCTTCGTCTATCTCCACGCCTATCGCTCCCGTCGGTACAGCCATCGCG 

pAV3RD0TKWRtAORGAOSOGSHVR3A - 
00' AVTKONCGKOIEVPIATAAMSVAL- 
SSKP ' PNKMACSR* RCR^ RROPCP' h 

TATAGCCTCCCGCCATGACCGTATCGACGAATCCATTCCCGTCTATACCACTTCCGCAACGATCACCCGTATCTGAACGC 

13041 ♦ ♦ »3120 

ATATCGGACCGCCCTACTGCCATACCTCCTTAGCTW».CCCCACATATCCTCAACGCGTTCCTAGTCGCCATAGACTTCCC 

lASRMDClOESIAVYTTCAR'TGI - TL- 
• pPAMTVSTHPLRSIPLAOGSPVStR 
ySLPP' RYRRI MCCLYHLRKOHRYLMA- 
TAATAACTCACCCCCTTCACTCCTATACTTCTCCACGTATTCACCTTTTATTTTCTTGTTATATCAAAGACTAAAAACCC 

13121 - ♦ ♦ 13200 

ATTArrGACTCCGCGAAGTCACCATATCAAGACCTGCATAACTCGAAAATAAAACAACAATATACTTTCTCATTTTTCGG 

ITOALHWYTSARIHLLFCCyMKO'KA 
• • LTRrTCILLHVFTFYrvvi' KTKKP- 
NN - RASLVYFCTYSPFIL^LLV E RLKSP- 

GCCCAAGTGCCACCCAAAACAAATAGCAGCGGAAATTTCAGTCTATTGTACCGGCGTATTACTATTTCTCCAGTGAAAAA 

13201 ♦ ♦ - ♦ ♦ 13280 

CGCCTTCACCGTCGCTTTTCTTTATCGTCCCCTTTAAAGTCAGATAACATCGCCCCATAATGATAAACAGCTCACTTTTT 

AEVAASHNSHCNrSLL* flCITl SPVKK- 
PK WOPKE rAGE ISVYCSGVLLFLO* KN- 
RSCSOKK'OGKrOSIVACYYYfSSEK 
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CAAArpCCTTTTTCATTACGAATATCACCCATTCWKCGCCCCTCTTTAATGTAAGAAACCATCWCGATAMCATCACC<^ 

\2*B\ - - - iirseo 

{;tttcggaaa^actaatgcttatagtgcctaacttccccgcacaaattacattctttcgtacccctatttgtactccctt 

QSLrHrCrHALNARV.' CKKAWR* TSPN. 
KAi riTNlTH* TRVrNVRKHCOKMMPI- 
KPrSLRISftieRACLM'ESMAINlTO 

TAGACCCCCGCACTCCCAACCCCGCACCCCATACCGCCCAGT7CCCCCATACCAAAATCCCCATACATAAAAATATAGTT 

l?b61 ♦ 12640 

ATCTGCCGGCGTCAGCGTTGCCGCGTCCGCTATGGCCGCTCAACGCCGTATCCTTTTACCCGTATCTATTTTTATATCAA 

RPPOSQRRSRtRRVpAYONGHR* KYSS- 
ORRSBMAAADTAErRHTKMAIDKNlV 

• T/-VAVATPQP1 PPSSCIPKWP' IKI' F- 

CACCCCAATATTCACCAGCAGGCCCAAAAATCCCATCACCATACCCCGTTTGCTTTTGGCCAGACCrrCGCACTCGTTTC 

12641 ♦ ♦ ♦ ♦ 12^20 

GTCCCCTTATAACTGCTCGTCCGGCTTTTTACGGTAGTGCTATGGCCCAAACCAAAACCCCTCTGGAACCCTGACCAAAC 

PC i SPACPKIPSPYPVWFWPOLRTGF 
HRN I HCOAOKSHHHTRfCFGOTFALVS - 
TCI rTSRPKNPITlPCLVLARPSHwrR- 
GCGCTACCTCAA>»GAAAAGGTATCCTGCCCCCCACAGCAGCGCGCGAAGATAACCCACCCCTTTATCGGCCAGCGCCCGA 

\2T2\ ♦ ♦ * * 12B00 

CGCGATGGACTTTCTTTTCCATAGGACCCGCCGTCTCGTCGCGCGCTTCTATTGCGTGCCGAAATAGCCGCTCCCCCCCT 

ALPtRKGlLRPTAAREDNPRLYRPAPO 
R Y LK PKVSCAPQORA X I THG F 1 CQRR I- 
AT-KKRYPAPH5SARR* PTALSASAG 

TCAATATTATCCATACAGCGGATAATCTATCCCGCATTCCACAGGACGATCATCACCAGCACGGAGACAAAGCCCCCCAO 

12801 ♦ »-'9e^> 

AGTTATAATACCTATCTCGCCTATTACATAGGCCCTAAGGTGTCCTGCTAGTACTGGTCCTCCCTCTGTTTCCCGCGGTC 

OY « A' SO - C ! RMSTCRSSPARROSPPA- 
NIMHRADNVS GIPODDHHOHCOKARO 
SI UCIERIMYPA rHRTI ITST E TKPAS- 

CCAGA.ACCCrTGTCCAACCTGATGCGCGATACGCTCACGACGCCCCGAGCCATTGAGTTCCGCAATCACAGGCGTCA«vCG 

12891 ♦ — - J2960 

GCTCTTCGGAACACCTTGGACTACGCCCTATCCCACTCCTGCCGGCCTCGCTAACTCAACGCGTTACTGTCCGCACTTCC 

RTLVCPDARYAHDCRSH* VAOSQAS- 
PEFLSNLMRDTLTTAGAlELRNHRRQv- 
ONr rRT* CAtRSRRPEPLSCAITGVKA. 
CCAGCAGTAAGCCGTCACCAAACAAAATCGCGCCA.«VGCAGATAGAGGTCCCGATAGCGACGGCAGCCATCTCCCTACCCC 

12961 ♦ ♦ ♦ ♦ * 1304C 

GGTCGTCATTCCCCACTCCTTTCTTTTACCGCCCTTCGTCTATCTCCACGGCTATCGCTCCCGTCGGTACAGGCATCGCG 

PAVSRDOTKHREAORGADSOGSHVRSA- 
Og* AVTKOMCGKOIEVPIATAAMSVAL- 
SSKP" PNKMAGSR* RCR" RROPCP* ^ 

TATAGCCTCCCCCCATCACGCTATCCACGAATCCATTCCGCTCTATACCACTTCCGCAACGATCACCGGTATCTGAACGC 
1 3041 ♦ ♦ ♦ 13120 

atatcccagcgccctactcccatacctccttacgta;».ccccagatatcctgaacccgttcctactggccatacacttgcc 

lASRHDClOESIAVYTTCAR ITGl* TL- 

• ppamtvstnplrsiplaogspvseb 

TSLPP' RYRRIHCGLYHLUKOHRYLNA- 

taataactgacgcgcttcactggtatacttctccacgtattcaccttttattttgttgttatatcaaagactaaaaaccc 

13121 13200 

attattcactccgcgaactgaccatatgaagacctccataagtcgaaaataaaacaacaatatactttctcatttttcgc 

itoalhwytsarihllfccymko'ka 

• . ltpftci llmvftryfvvi* ktkkp- 
nn - raslvyfctyspfil^llyerlksp- 

gccgaagtgccagccaaaagaaatagcaggggaaatttcagtctattctagccccgtattactatttctccagtgaaaaa 

1 3201 ♦ ♦ ♦ — - 13280 

cggcttcaccgtccgttttctttatcgtcccctttaaagtcagataacatcgccccataatgataaacaggtcacttttt 

AEVAASKNSHCNrSLL» RGITISPVKK- 
PkwoPkE IAGC I SVYCSGVLLFLO' KN- 
RSCSOKK'OGKFOSIVAGYyYfSSEK 
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ACAGTTCTTAACGCCCCATTCCrrCCCAACCTCTTTTTCCACCTGCTATTGTCCTGAACACTTCTCCTTTTATTTAT^ 

1 3360 

TGTCAACAATTCCCCCCTAACGACCCTTCCACAAAAACCTGGACGATAACACCACTTCTOyvCACGAAAATAAAT;!^ 

OLtTAHCWQAVrPPAIVLNSSAri YTR- 
SC'RRIACKLrrHLLLC-TVLLLriS 

TvvNCA LLAScrsTCYCACorcfr tro- 

CGACTTGAAGATATGTTTACCCCGATCCTACACGCTACCGCCAAACTGGTATCCATA 

13361 ♦ — ♦ ♦ ♦ I3«n 

CCTCAACTTCTATACAAATCCCCCTACCATCTCCCATGCCCCTTTGACCATACCTAT 

S'RYVYGDRTG VRETGID 
GVEOMFTGlVOCTAKtVSI 
ELKICLRGSYRVpRMWYR 
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DNA sequence of VGC II cluster C 

Tn insertion P9B< 
8 

OCATCCTTTTTCTTTAATGCTCCTAACCTTTCTTCCAAAATCCCTTGATGACATTCATCCACTACACCACTCATAACAM 

1 ♦ ♦ ♦ - — ♦ - ♦ ♦ 80 

CCTACGAAAAAGAAATTACGACCyKTTCCAAACAACCTTTTACCCAACTACTCTAACTAGGTCATGTCGTCACTATT^ 

Tn insertion P7AJ 

AGAGCCNCCCATTGCCNWAMMvn-KRNNMRNNSCNNNACTAAACCCTTCTCTATTATCCCACAAATAATATCATCCCCCTG 

81 ♦ ♦ ♦ 160 

TCTCCCNGCCT AACCGNHTK KWAM r NNKY NNSGMNNTGATTTGGCAACAGATAAT ACCGT CTTT ATT ATAGT AGCCCGAC 

AGACrCATGAGACTGACTAATCTCCCAGTGCAATAACCCGGCAATATCTCCAAGTAATCGTTGAACCTTGCCCCATTGCT 

161 - ♦ ♦ ♦ 2*0 

TCTGACTACTCTCACTCATTACACGCTCACGTTATTGGGCCCTTATACACGTTCATTACCAACTTGGAACGCGGTMCCA 

Tn insertion P^S"! 

GATCCATTTCTATATCATCATGAATTAACACCCTCCCCGGCCCTTCCCTCGATACTTCACCATNSSGGTAACCCATTTTT 

241 * - ♦ — ♦ ♦ 3?0 

CTAGGTAAACATATAGTAGTACTTAATTGTGCGAGGGGCCGGGAACCCACCTATGAAGTCGTANSSCCATTGGGTAAAAA 

ATC/'AAACATCCTGCACTTCTCCTACCAATAAGTCATCACACATTACACCATCCCCATACATGACCCCCCATaiTTCCAG 

321 ♦ ♦ ♦ * ♦ • 100 

TAGTTTTGTAGGACGTGAAGACCATGGTTATTCAGTAGTGTCTAATGTGGTAGGGCTATGTACTCCGGGGTACTiAGCTC 

AGTCCCTCTCACCT7TTGCATCTGTTCGCTTGACGAGCAATAACCGGACAACTGCAGGCTGCCATCTTCTTTCCATTGCG 

401 ♦ * . ♦ ♦ * * 4 80 

TCAGCGACAGTGGAAAACGTAGACAAGCGAACTCCTCCTTATTGCCCTGTTCACGTCCGACCGTAGAAGAAAGGTA.ACCC 

CCCCCACATAATGAATATTCCTTTTGTCTAATAAAAACTTAACCCGCAAAGCTAACTCATTTACCGTrrCAGCCTGACCA 

481 - — ♦ ♦ ♦ J60 

GGGCGTCTATTACTTATAACGAAAACAGATTATTTTTGAATTGGGCiSTTTCCATTCAGTAAATGGCAAAGTCCGACTGGT 

CTAA7ACT7>^5vCAGGACACCCATTCCACCGATGAAAATCAAGAATACGCCAGCCAACCACCACTACCCTGATCTGGAAAC 

S61 ♦ ♦ — ♦ ♦ ♦ 640 

GATTATCAATTGTCCTGTCGGTAAGGTCCCTACTTTTAGTTCTTATCCGCTCGGTTGGTCGTCATCCGACTAGACCTTTG 

CGGTATTTGATAATCACCAAGTTCACAATCCTCTTTACCAAACGCGATASSCACTCCCCCAACCTGCAAAACCCCACTCG 

641 - ♦ ♦ 730 

CCCATAAACTATTAGTCGTTCAAGTGTTAGGACAAATGGTTTGCGCTATSSGTGACGCCGTTGGACCTTTTCGGGTGACC 

ATGGTACCGGCTT.-TTTGGATTAAATCTGCCCCCATTAACTCTAACTCTGGCTTTCCCGGCATCAACAAATAAACTATCT 

721 • ♦ ♦ ♦ 300 

TACCATCGCCCAATAAACCTAArrTAGACGCCGGTAATTGAGATTGAGACCGAAACGGCCCTAGTTGTTTATTTGATAGA 

CCCTGTTCTCTCAGAATAATTTTTTCATTTATAGCCAGCGAATACAAATATCGCATCCCTTCTCCCCCAGTGACAGGTTA 

801 ♦ ♦ ♦ — — ♦ 980 

CCCACAACAGAGTCTTATTAAAAAAGTAAATATCGGTCGCTTATCTTTATAGCCTAGCCAACACGCGCTCACTGTCCAAT 

CCTTCATTCACCCATACrrCCCCCCCTTGTAAAACCTCACCTAAAAAACCTATTTTCCACGAACTCTTTGGATTAACCAT 

881 - ♦ ♦ ♦ ♦ 960 

GGAAGTAAGTCGGTATGAAGGGCCGGAACATTrrGCACTGGATTTTTTGCATAAAAGGTCCTTGAGAAACCTAATTGGTA 

GAGATATGCCATTATTTACTACTGACGCTTTAATCAAAAAAAGCCTGATTACACTATCTACTTGAGTCCTATCATTCCCA 

961 — ♦ ♦ ♦ ♦ 1040 

CTCTATACGGTAJ^TAWCATGACTCCGAAATTAGTrnrrrCGGACTAATCTCATACATGAACTCAGCATAGTAACrc 

AACAAATGACCTAC.WCAGGAATATCGCCCAATAAAGGCATTTTCTTTTGCGAGTCCATTTCTTTACCTTCTTTAAACCC 

1041 - " ♦ — ♦ — ™ ♦ 1120 

TTGTTTACTGGATCTTGTa;TTATACCCGCr^ATr^CCCTAAAACA^AACGCTCACCTAAACAAATGCAACAAATTT 

TCCCAGCAATNAGACTTTGCCCGCCCAATAATGTGCCTTGCCAANCRATTTCACAATTTTGCACTTCGGGCACCCCCTCT 

1121 ♦ ♦ ♦ ♦ ♦ ♦ 1200 

ACGGTCGTTANTCTCAAACCGCCCGGTTATTACACCGAACCCTTNGYTAAAGTCTTAAAACGTGAAGCCCGTCGCCCAGA 

GTNTrCCyTTKGNSTATCACTTTGTTGTCCATCCTGAANTATTAAGATTAACCATTATTTTTTCCCTGCCATTGTCATTT 

1201 ♦ — * — ♦ 1280 

CANARCGRAAMCN3ATACTGAAACAACAGCTAGCACTTNATAATTCTAATTCGTAATAAAAAACGCACGGTAACACTAAA 

AACAAGCGAGCTCTA/KGCGWNAACAAAGAACCCGTACTGATGGATTCAAGTTTAGCCACTTTTTCTCCCTGCACTTTGG 

1281 ♦ - ♦ ♦ 1360 

TTCTTCCCTCCACATTCCGCWNTTGTTTCTTGGGCATCACTACCTAAGTTCAAATCCGTGAAAAACAGGGACGTCAAACC 
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TATACAAACTAATATTTTTATCCACCACACCCTCGATATTATTTAAACTCACCACACATGCCTCGCyWWJTAC^^ 

1361 ♦ ♦ ♦ H40 

ATATCrrTTaVTTATAAAAATACCTCGTCTCCCACCTATAATAAATTTCACTGCTGTCTACCCACCCTTTCATCTA 

TGAGAGCTTTTTTCCACGGCATTCAGACCCACCATAAAGTTTGACGTATCCCTGATTACCCTTCANNAACCACTACCACC 

1441 ♦ ♦ ♦ ♦ ♦ 1^2.0 

ACTCTCGAAAAAAGGTCCCGTAACTCTGCCTCCTATTTCAAACTCCATAGCGACTAATCCCAACTNNTTCGTGATCGTCC 

ACCGTCATTCAAACCTCTATTGAACGCAATTTTCTTCCCACCCAGCGACACTCCCCTTCCCCACTCGATC 

IS?1 ♦ ♦ ♦ ♦ ♦ * ♦ 1600 

TCGCACTAAGTTTGGAa^TAACTTCCGTTAAAACAACCGTGCGTCGCTCTGACCGCAACGGGTCAGCTACGGATTGACCA 

TAATATCTCCAGCATTAACATCCATAATTTTCACCGAAATCTCTATCATCTGCTGGCCTTGATCTAATTCTGTCATGAGT 

1601 ♦ — ♦ ♦ ♦ ♦ — ♦ 1680 

ATTATAGAGGTCCTAATTCTACCTATTAAAACTGGCTTTACACATAGTAGACGACCCCAACTAGATTAACACACT^^ 

TTCCGATACNNWGCCATATTCGHNNCATAATCACCJU^CGATCACTGCATTCTGCCCTNCCGTCCCCAGCAAACAT 

1681 ♦ ♦ ♦ ♦ n60 

AAGCCTATGNNNCGGTATAACCNNNGTATTACTCCTTCCTACTGACCTAACACCGCANCCCACCCGTCGTTTCTANCCGT 

ATGCCrrGTGTAGCGCGTGAACCATTCTTCNTCGATGACGTCCCGACCCTCCTTTTACTCATCTCACGCAATACACTAACC 

nei * ♦ ♦ ♦ ♦ ♦ ♦ ♦ IB40 

TACGGACACATCCCCCACTTGGTAACAAGNACCTACTGCAGCCCTCCCACCAAAATCAGTAGAGTCCCTTATCTGATTGC 

ACCCCTGCNNAACCACGACCGACTGATCCCCATATTGCTACTCCCTATCCATCGCAGTGGCATACTTAACCCTGTATATA 

1841 - - ♦ ♦ *• ♦ 1920 

TCGCGACCNNTTGGTGCTGCCTCACTACCCCTAT.\ACCATCACCCATAGCTAGCCTCACCGTATCAATTCGCACATATAT 

CrrACACTCACCCCACTGTCrrTTCGTTTGATTAACCCATTATCCAGCACTGAAGCTAATTCACTAATACCACTCAG^ 

1921 ♦ ♦ ♦ * ♦ «• ♦ ♦ 2000 

GAATGTCACTCCCCTGACAGAAAAGCAAACTAATTCCGTAATAGGTCGTGACTTCCATTAACTGATTATGCTC^CTCCCT 

Tn insertion P7G2 
U 

GCTGCCAACACCGCTCACCTCCACAGCTTTGGTACCCCTAATTTCTTTAACCTCCCATCCCCGTGATGAAACGATATTCT 

2001 ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ 2080 

CGACCCTTGTGGCGAGTGGACCTCTCGAAACCATGCCCATTAAAGAAATTGGAGCGTAGGCCCACTACTTTCCTATAAGA 

GCCTCCCTAACTAATGAATGAftCCCTCCACTAGATAAAATATTGAAACTCATAACCTCATCTTTTAATAACGATGC^ 

2081 ♦ ♦ ♦ *■ ♦ ♦ 2160 

CCGACGCATTCATTACTTACTTGCCAGCTCATCTATTTTATAACTTTCACTATTGGACTACAAAATTATTGC^ 

TATACATATAACATCCTGCCATCAAACCAGGTAACCAAATCATATTCTCCTGCCACGTTATTCAAAATATCGACCGGTGG 

2i6l ♦ ♦ - ♦ 7240 

ATATGTATATTCTACCACGCTAGTTTGGTCCATTCCTTTACTATAACACGACCGTCCAATAAGTTTTATAGCTGGCCACC 

TCCACGCGGAATTTTTCCACTAAATCTAGCTCTTATCAATCGCCTAATAGTAATACCCCTATCATACTTCTCTCAGAGCA 

2241 ♦ — ♦ — ♦ 2320 

ACCTCCGCCTTAAAAAGCTGATTTACATCGACAATAGTTACCCGATTATCATTATCGGCATACTATCAAGAGACTCTCCT 

GATGTNAAAACCTCTGCTAATGGCATTTGTCTCGCATAAACGGTGAAGTCATTACCTTTCCATGATAACTCATC^ 

2321 ♦ ♦ - 2400 

CTACANTTTTGGACACGATTAC<XTAAACAGACCCTATTTCCCACTTCACTAATGGAAAGGTACTATTCACTA^ 

TGCTGTATTGACTATAAATACTAAAATTAAGATTAAACGTTTATTTACTACCATTTTATACCCCACCCGAATA.AACm 

2401 ♦ ♦ 2480 

ACGACATAACTCATATTTATCATTTTAATTCTAATTTGCAAATAAATCATGGTAAAATATGGGGTCGGCTTATTTCAAAT 

TGGTGATTCCCTATTACATTTTTTMAAAATGCAAGTTAAACCCAGGTCTTTTTCTATCTCAATAGCAATAACCT 

2481 ♦ — ♦ * * ♦ 2!>60 

ACCACTAACGCATAATCTAAAAAANTTTTACGTTCAATTTCGCTCCACAAAAAGATAGAGTTATCGTTATTCCAGT^ 

TACTACTTCTCGTATAATAACCCTTTAACCATCCCCCATCCGCTGTCACCTCTATACCATAATCATCGACGTCCCCCTCT 

256J — ♦ ♦ ♦ ♦ ♦ ♦ 2o40 

ATCATGAACACCATATTATTGCCAAATTGCTACCGGCTACGCCACACTCGACATATCCTATTACTACCTCCAGGCCXACA 

Tn insertion P1IB9 

a 

CCGCAARCRGTAGTCTCAMMTAGCCAAGACAACGCTTAGGTAAGCTTTCCAGGTCATTTAAGAACAAAGAAATAGAAAA^ 

2641 ♦ ♦ ♦ ♦ 2720 

CGCGTT Y G y CATCACAGTKKATCCGTTCTGTTCCGAATCCATTCGAAACGTCCACTAAATTCTTGTTTCTTT ATCTTTTA 

GCTTCTGACAAAATTTCTTCYBHNNNMNNNNNNNNNNNNNHNNNNNNMCATCAATACTCATTATCCAGCATSSKMTWWVH • 

2721 ♦ ♦ ♦ ♦ ^ ♦ ♦ 2800 

CGAACACTCTTTTAAAGARGRVONNNNNNNNNNNNNNNNNNHNNNNNNGTAGTTATCAGTAATACCTCCTASSMKAWWRK 
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NyyK5SSCYSW>C^TMYySWR^ArrTAATGCAATCCCTTTTAAAACTCCCAGCATCAATCCCTCCTCAGACATAAATG^ 

2B0I — ♦ — — -- ♦ 2800 

NRRMSSSGRSWMTAKRRS>rfWWAATTACCTTACCCAAAATTTTGACCCTCCTACTTAGCCAGGACTCTCTAT^^ 

TTTCTATCAAATTCGCrCACAACCACATCCGTAAA^^GCCTCU^TTCACATTTATTTCCACTATACTCTTCnTGTACAA 

2881 ♦ ♦ ♦ ♦ ♦ ♦ 2960 

AAAGATACTTTAACCGACTGTTCCTCTACCCATTrrTCG<y^CTAACTCTAAATAAACCTCATATCACAAC^ 

TCAGCATGCTGTCTACATATACCTTCTCACACGCGATTCTATCATTCG<WTTTTCCCATAAATTNfWC^ 

2961 - ♦ ♦ ♦ )040 

ACTCCTACGACAGATCTATATCGAACACTGTCCGCTAAGATAGTAACCCrAAAACCCTATTTAANKKCTTAATCTAAAAC 

AGCATTGACATAAAAACTTACAATTTGNAAAATTATTTATTAAATAAACTGTTACGATCTTTTTACATC^^ 

30O ♦ * 3120 

TCCTAACTCTATTTTTCAATCTTAAACKTTTTAATAAATAATTTATTTCACAATCCTACAAAAATCTAGCG^^ 

AAAAAGTAATTGTACTCATCGACTNCCTTATATATGAACAAATTTATCTTCCTAATCATAACACCATCCATTAATCWW^ 

3121 ♦ ♦ ♦ * ♦ ♦ 3200 

TTTTTCATTAACATCAGTAGCTCAHCCAATATATACTTCrrTAAATAGAACCATTACTATTGTCCTAGCTAATT 

GATGAAACTATATCTACTGCGATAGTGATCAACTGCCAAACATTTTGCAACACGCAACTCGACGGAAGCATTATGAATTT 

:20l ♦ ♦ ♦ ♦ 3280 

CTACTTTCATATACATGACGCTATCACTAGTTCACGCTTTCTAAAACGTTGTCCGTTGACCTCCCTTCCTAATACTTAAA 

SSTCAATCTCAAGAATACSSYSYRNNNNNKTCTTTAGTAATCAGGCTAACTrrTTTATTTTTATTA.^CAACA^ 

3281 * - ♦ ♦ 3360 

S SACTTAGAGTTCTT ATG SS RS R Y NNNNNHAGAAAT CATTAGTCCCATTGAAAAAATAAAAAT AATT GTTCTTATTAAWA 

TTGGCTCCTATCTCTGCTTACCGCACCTTATATATCAATGGTTCRGAAACGGCAGCATATAATAGAGCATTTATCCCTTC 

-'361 — ♦ — ♦ ♦ ♦ 3«<0 

AACCGACGATAGACACCAATGGCGTCGAATATATAGTTACCAACYCTTTGCCGTCGTATATTATCTCCTAAATAGGCA\C 

TATCCGAGATGAATATTGTACTAACCAATCAACGGTTTGAAGAAGCTGAACGTGACCCTAAA^ATTTAATCTATCAATCC 

3M1 ♦ ♦ ♦ ♦ ♦ 35^0 

ATACGCrrCTACTTATAACATCATTCCTTAGTTCCCAAACTTCTTCGACTTGCACTGCGATTTTTAAATTACAT^^ 

TCATTAGCGACTGAGATTCATCATAACGATATTTTCCCT<UGGTGAGCCGGCATCTATCTGTCCGTCCTTC^^ 

3521 ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ 3600 

AGTAATCGCTGACTCTAACTACTATTGCTATAAAACGGACTCCACTCGGCCGTACATAGACAGCCAGGAAGTTTAACGTG 

MGCCCACCCTN.^ACGGAGAGAACCACCCTCTCTTTCTGCAGTCCTCTGATATCGATGAAAATACCTTTCGTCCCGATA^ 
3601 ♦ ji30 

KCGGCTGCGANTTCCCTCTCTTCGTCCCAGACAAACACCTCAGCACACTATACCTACTTTTATCGAAAGCAGCGCTATCA 

Tn insertion P3r« 
U 

TTTATTCTTAATCATAAAAATGAGATTTCCTTATTATCTACTGATAACCCTTCAGATTATTCAACTCTACAGCCm 

3681 - — - ♦ ♦ ♦ 3-«0 

AAATAAGAATTACTATTTTTACTCTAAACCAATAATAGATCACTATTGGGAAGTCTAATAAGTTGAGATGTCCGAAATTC 

GCGAAAAAGCTTTCCTTTATACCCAACCCATCCCCGGTTTTACTCGACTCAACCACAATACATAAACCGCAAACGATTO 

3761 3S4:» 

CGCT T I rTCGAAACCAAATATGCGTTGGCTACGGCCCAAAATGACCTCACTTGGTCTTATCTATTTGCCGTTTCCTACCG 

AACGCTTCCCTTGCCGTTCCCGATCACGCAACCCCTATTTTTTCAGCTGACGCTTAAACTTCCCGATCTCATTACTAAGA 

3841 ♦ ♦ ♦ ♦ ♦ ♦ ♦ 3J20 

TTCCGAACGCAACGCCAACG<KrrAGTCCCTTCCCCATAAAAAACTCCACTCCCAATTTGAAGCCCTAGy^ 

GCCACCTGCCATT.AGATGATAGTATTCCACTATCCCTCCATCAAAACAACCACTTATTCCCGTTTTCATACATCCCCCCA 

3921 ♦ ♦ ♦ ♦ ♦ ♦ ♦ 4000 

CGCTCGACGCTAATCTACTATCATAACCTCATACCGACCTAGTTTTGTTGCTCAATAACCCCAAAAGTATGTAGGGCCGT 

AAAAATACCTACACACTTAGAAAATCTAACGCTCCATCATGGATGCCAGCAAATTCCCCGATrrCTCATATTACGCACAA 

4001 • ♦ — ♦ ♦ ♦ 4060 

TTTTTATGCATGTCTOVATCTTTTACATTGCGACGTACTACCTACCGTCCTTTAACCGCCTAAACACTATAATCOT 

CCTTCCATGGCCCCGGATCGACTCTGGTTACGCTCTACCCATACCGTAATCTACATAATCGCATCTTAAAAATTATCCTT 

4081 i-..* — . ♦ 4160 

GGAACCTACCCCCCCCTACCTCACACCAATGCGACATCGCTATCCCATTACATGTATTAGCCTAGAATTTTTAATACGAA 

CAACAAATCCCCTTTACATTAACAGCATTCGTGTTGATGACGTCCCCTTTTTGCTCCTTACTACATCGCTCACT^ 

4161 ♦ ♦ ♦ • ♦ ♦ ♦ 4240 

GTTCTTTAGGGGAAATCTAATTGTCGTAACCACAACTACTGCAGCCGAAAAACGACCAATGATCTAGCGAGTGACCGGTT 
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ACCCa'TATCX;CCTTTTCTCCATCTC:ATTAATMAACCCCA^CTCCACCCCTGACCACACCTTTACCAC^ 

42M ♦ O20 

TGCCAATACCCCAAAACACCTACACTAATTATTTTGCCCrrCACGTCCCGACTCCTCTCCAAATCCTCCT 

ATCAATTACATACTATTCCCGCTCCTTTTAACCAACTCCTTGATACTCTACAACTCCAATACCACAATCTGCAAAACAAA 

TACTTAATCTATCATAACCCCCACGAAAATTCGTTGACCAACTATGACATCTTCACCTTATGCTCTTACACCT^ 

GTCGCAGACGCACCCACGCCCTAAATGAAGCAAAAAAACGCCCTGAGCNAGCTAACAAACCTAAAAGCATTCATm 

4401 ♦ ♦ — *- ♦ ♦ <<80 

CAGCGTCTCCGTGGGTCCGCGATTTACTTCCTTTTTTTGCCCGACTCCWCGATTCTTTGCATTTTCGTAACT^ 

CTAATAACTCATGAGTTACGTACTCCCATGAATGCCCTACTCGGTGCAATTGAATTATTACAAACCACCCCTTTAAAC^^ 

4481 ♦ ♦ ♦ ♦ 

CATTATTCACTACTCAATGCATGAGCCTACTTACCGCATGAGCCACCTTAACTTAATAATCTTTGGTCGCGAAAm 

ACyKGCAACAAGGATTACCTGATACCCCCAGW^TTCTACACTCTCTTTCTTAGCTATTATTAATAATCTCCTCCAT^ 

4561 - - * ♦ 4640 

TCrCCTTCTTCCTAATCGACTATGCCGGTCTTTAACATGTGACAGAAACAATCGATAATAATTATTAGACCACCTAAAAA 

CACCCATCGAGTCTGCTCATTTCACATTACATATCCAAGAAACAGCCTTACTCCCGTTACTGCACCAGGCAATCCAAACC 

464 1 ♦ ♦ 47?0 

CTGCGTACCTCAGACCAGTAAAGTGTAATGTATACCTTCTTTGTCGCAATGACGCCAATGACCTGGTCCGTTACGTTTGG 

ATCCACGCGCCAGCCCNAAACCAAAAAACTGTCATTACCTACTTTTGTCGGTCAACATGTCCCTCTCTATTTTCATACCG 

4721 ♦ ♦ ♦ ♦ ♦ ♦ ♦ 4800 

TAGGTCCCCGGTCGCGNTTTCGTTTTTTGACAGTAATGCATGAAAACAGCCAGTTGTACAGGGAGAGATAAA^.GTATGCC 

ACAGTATCCGTTTACNNCAA.^TTTTGCTTAATTTACTCGGGAACGCGGTAAAATTTACCGAAACCGGAGGATACCTCTGA 

4801 • ♦ ♦ ♦ ♦ ♦ • ♦ «6B0 

TGTCATAGGCAAATGNNGTTTAAAACCAATTAAATGAGCCCTTCCGCCATTTTAAATCGCTTTGGCCTCCTATGCACACT 

CGCTCAACCGTCATGAGGAACAATTAATATTTCTGGTTAGCGATAGCGGTAAAGGGATTGAAATACAGCACCACTCTCAA 

4881 ♦ ♦ ♦ ♦ ♦ ♦ * ♦ 4960 

GCCAGTTCGCAGTACTCCTTGTTAATTATAAAGACCAATCGCTATCGCCATTTCCCTAACTTTATGTCGTCCTCAGACTT 

ATCTTTACTCCTTTTTATCAAGCAGACACAAATTCGCAAGCTACACGAATTCGACTCACTATTCCGTCAAGCCT 

496t ♦ ♦ * — ♦ ♦ *• 5040 

TACAAATGACGAAAAATAGTTCGTCTGTGTTTAACCCTTCCATCTCCTTAACCTGACTGATAACGCAGTTCGGACCCATT 

AATGATGCCCGGTAATCTGACACTAAAAACTCTCCCCCCGGTTGCAACCTGTCTCTCGCTAGTATTACCCTTACAAGAAT 

S041 ♦ ^ ♦ ♦ — - 5120 

TTACTACCCGCCATTAGACTGTGATTTTTCACACGGGCCCCAACCTTGGACACAGAGCGATCATAATCCGAATGTTCTTA 

Tn insertion 

ACCACCCCCCTCAACCAATTAAAGGGACCCTGTCACNNNCCGTTCrCCCTGCATCCGCAACTGGCTTCCTCGGCAA^ 

5121 ♦ ♦ ♦ ♦- - — ♦ ♦ 520C 

TGGTCCGCGGAGTTGGTTAATTTCCCTGCGACAGTCNNNGCCAACACCGACCTAGCCCTTCACCGAACGACCCCTTATGC 

CCGTGAACCACCCCACCAGCAAAATGCCCTTCTCAAHMCNAGAGCTrrTCTATTTCTCCGGAAAACTCTAC 

5201 ♦ ♦ 5280 

GCCACTTCCTGGGGTCGTCGTTTTACCCCAAGAGTTKMGMTCTCCAAAACATAAACAGCCCTTTTGACATCCTGGACCGC 

CAACAGTTAATATTGTGTAC.^CCAAATATGCCACTAATAAATAATTTCTTACCACCCTGCCAGTTGCAGATTCTTTTGGT 

5281 — * ♦ — - ♦ ♦ 5360 

GTTGTCAATTATAACACATCTCCTTTATACGCTCATTATTTATTAAACAATCCTGCGACCaTCAACCTCTAACAAAACCA 

TGATCATGCCCATATTAATCGGGATATCATCCCCAAAATGCTTGTCACCCTGCGCCAACACCTCACTATTCCCCCCACTA 

5361 --^ ♦ - ♦ - 5440 

ACTACTACGCCTATAATTAGCCCTATACTACCCGTTTTACGAACAGTCCGACCCCGTTGTGCAGTCATAACGGCGCTCAT 

CTAACCACCCTCTCACTTTATCACAACACCAGCCATTCGATTTACTACTCATTCACATTAGAATGCCACAAA 

5441 ♦ ♦ ♦ ♦ ♦ 5520 

CATTCCTCCGAGACTGAAATAGTGTTCTCCTCGCTAACCTAAATCATGACTAACTGTAATCrTACCGTCTTTATCTACCA 

ATTGAATGTCTACCATTATGCCATGATGAGCCGAATAATTTACATCCTGACTCCATCTTTGTGGCACTATCCGCTACCGT 

5521 ♦ — - ♦ ♦ ♦ ♦ 5600 

TAACTTACACATGCTAATACCGTACTACTCGGCTTATTAAATCTAGGACTCACCTACAAACACCGTGATACCCGATCGCA 

ASCWMAGAWRVmivn'CRTYCTDDAAAAAAWRDGRKDHVaCATKAYANWTTACAAAACCAGTGACATTGCCTACCTTA^ 

5601 - ♦ • 5680 

TSCBNKTCTWYWAKWAGYARCAHHTTTTTTWYHCYMHWAGTADTRTNNAATGTTTTGGTCACTCTAACCCATCGAATCC 
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TCCCTACATCAGTATTCCCCCAGWACCAACTTTTACGAAATATAGAGCTACAGGACCAGCATCC 

5691 ♦ ♦ ♦ 

ACayVTCTACTCATAACGGCCTCTTATGCTTGAAAATCCTTTATATCTCGATGTCCTCCTCCTACG 
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